Prepare a Data Frame for the OpenAI Batch API - Embeddings
Source:R/openai_batch_api.R
oai_batch_prepare_embeddings.RdPrepare a Data Frame for the OpenAI Batch API - Embeddings
Usage
oai_batch_prepare_embeddings(
df,
text_var,
id_var,
model = "text-embedding-3-small",
dimensions = NULL,
method = "POST",
encoding_format = "float",
endpoint = "/v1/embeddings"
)Arguments
- df
A data frame containing text to process
- text_var
Name of the column containing input text
- id_var
Name of the column to use as row ID
- model
The embedding model to use
- dimensions
Number of embedding dimensions (NULL uses model default)
- method
The HTTP request type, usually 'POST'
- encoding_format
Data type of the embedding values
- endpoint
The API endpoint path, e.g. /v1/embeddings
Details
Takes an entire data frame and turns each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger.
Each request must have its own ID, as the Batch API makes no guarantees about the order the results will be returned in.
To reduce the overall size, and the explanatory power of the Embeddings, you can set dimensions to lower than the default (which vary based on model).
Examples
if (FALSE) { # \dontrun{
df <- data.frame(
id = c("doc_1", "doc_2", "doc_3"),
text = c("Hello world", "Embedding text", "Another document")
)
jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id)
} # }