Skip to contents

Prepare a Data Frame for the OpenAI Batch API - Embeddings

Usage

oai_batch_prepare_embeddings(
  df,
  text_var,
  id_var,
  model = "text-embedding-3-small",
  dimensions = NULL,
  method = "POST",
  encoding_format = "float",
  endpoint = "/v1/embeddings"
)

Arguments

df

A data frame containing text to process

text_var

Name of the column containing input text

id_var

Name of the column to use as row ID

model

The embedding model to use

dimensions

Number of embedding dimensions (NULL uses model default)

method

The HTTP request type, usually 'POST'

encoding_format

Data type of the embedding values

endpoint

The API endpoint path, e.g. /v1/embeddings

Value

A list of JSON requests

Details

Takes an entire data frame and turns each row into a valid line of JSON ready for a .jsonl file upload to the OpenAI Files API + Batch API job trigger.

Each request must have its own ID, as the Batch API makes no guarantees about the order the results will be returned in.

To reduce the overall size, and the explanatory power of the Embeddings, you can set dimensions to lower than the default (which vary based on model).

Examples

if (FALSE) { # \dontrun{
df <- data.frame(
  id = c("doc_1", "doc_2", "doc_3"),
  text = c("Hello world", "Embedding text", "Another document")
)
jsonl_content <- oai_batch_prepare_embeddings(df, text_var = text, id_var = id)
} # }