Builds embeddings from the concatenated text of all nodes
(signature, body_text, roxygen_text). Three backends are
supported:
Usage
embed_nodes(
func_nodes,
method = c("tfidf", "ollama", "openai"),
min_term_count = 1L,
min_doc_count = 1L,
existing_corpus = NULL,
verbose = FALSE,
cache_dir = NULL
)Arguments
- func_nodes
A list of node records as returned by
extract_function_nodes(). Each record must havenode_id,signature,body_text, androxygen_textfields.- method
Character(1). One of
"tfidf","ollama", or"openai". Default"tfidf".- min_term_count
Integer(1). TF-IDF vocabulary pruning: drop terms appearing fewer than this many times. Default
1L.- min_doc_count
Integer(1). TF-IDF vocabulary pruning: drop terms in fewer than this many documents. Default
1L.- existing_corpus
Optional character vector of additional document texts to include when fitting the TF-IDF vocabulary. This keeps incremental updates in the same vector space. Ignored for
"ollama"and"openai"backends.- verbose
Logical(1). Emit progress messages. For the
"openai"backend also prints an estimated cost. DefaultFALSE.- cache_dir
Character(1) or
NULL. Directory used to store embedding caches for theollamaandopenaibackends.NULL(default) uses the.rrlmgraph/folder under the working directory.
Value
A named list:
embeddingsNamed list of
numericvectors, one per node.modelEmbedding model for use with
embed_query().matrixOptional numeric matrix (rows = nodes) for backends that return dense vectors.
NULLfor TF-IDF.
Details
tfidfSparse TF-IDF vectors (default, no API required).
ollama768-dim dense vectors via the local Ollama daemon (
nomic-embed-text). Falls back to TF-IDF when Ollama is unavailable.openai1536-dim dense vectors via the OpenAI
text-embedding-3-smallmodel. Requires theOPENAI_API_KEYenvironment variable.
Examples
if (FALSE) { # \dontrun{
proj <- detect_rproject("/path/to/mypkg")
nodes <- extract_function_nodes(proj$r_files)
result <- embed_nodes(nodes) # TF-IDF
result <- embed_nodes(nodes, method = "ollama") # Ollama
result <- embed_nodes(nodes, method = "openai") # OpenAI
} # }