Process text chunks through Anthropic's Messages API with batch file output
Source:R/anthropic_messages.R
ant_complete_chunks.RdProcesses large volumes of text through Anthropic's Messages API in configurable chunks, writing results progressively to parquet files. Handles concurrent requests, automatic retries, and structured outputs.
Usage
ant_complete_chunks(
texts,
ids,
chunk_size = 5000L,
model = "claude-haiku-4-5",
system_prompt = NULL,
output_dir = "auto",
schema = NULL,
concurrent_requests = 5L,
temperature = 0,
max_tokens = 1024L,
max_retries = 5L,
timeout = 30L,
key_name = "ANTHROPIC_API_KEY",
endpoint_url = .ANT_MESSAGES_ENDPOINT,
id_col_name = "id"
)Arguments
- texts
Character vector of texts to process
- ids
Vector of unique identifiers (same length as texts)
- chunk_size
Number of texts per chunk before writing to disk
- model
Anthropic model to use
- system_prompt
Optional system prompt (applied to all requests)
- output_dir
Directory for parquet chunks ("auto" generates timestamped dir)
- schema
Optional JSON schema for structured output
- concurrent_requests
Number of concurrent requests
- temperature
Sampling temperature
- max_tokens
Maximum tokens per response
- max_retries
Maximum retry attempts per request
- timeout
Request timeout in seconds
- key_name
Environment variable name for API key
- endpoint_url
Anthropic API endpoint URL
- id_col_name
Name for ID column in output
Details
This function is designed for processing large text datasets. It divides input into chunks, processes each chunk with concurrent API requests, and writes results to disk to minimise memory usage and possibility of data loss.
Results are written as parquet files in the specified output directory, along with a metadata.json file containing processing parameters.
When using the output_dir = argument, be careful that you select
a new directory if you do not wish to overwrite existing chunks.
If there is already a chunks_001.parquet file in the directory,
it will be overwritten.