Classify multiple texts using Hugging Face Inference Endpoints
Source:R/hf_classify.R
hf_classify_batch.Rd
Classifies a batch of texts using a Hugging Face classification endpoint and returns classification scores in a tidy format. Handles batching, concurrent requests, and error recovery automatically.
Usage
hf_classify_batch(
texts,
endpoint_url,
key_name,
...,
tidy_func = tidy_batch_classification_response,
parameters = list(return_all_scores = TRUE),
batch_size = 8,
progress = TRUE,
concurrent_requests = 5,
max_retries = 5,
timeout = 30,
include_texts = TRUE,
relocate_col = 2
)
Arguments
- texts
Character vector of texts to classify
- endpoint_url
URL of the Hugging Face Inference API endpoint
- key_name
Name of environment variable containing the API key
- ...
Additional arguments passed to request functions
- tidy_func
Function to process API responses, defaults to
tidy_batch_classification_response
- parameters
List of parameters for the API endpoint, defaults to
list(return_all_scores = TRUE)
- batch_size
Integer; number of texts per batch (default: 8)
- progress
Logical; whether to show progress bar (default: TRUE)
- concurrent_requests
Integer; number of concurrent requests (default: 5)
- max_retries
Integer; maximum retry attempts (default: 5)
- timeout
Numeric; request timeout in seconds (default: 20)
- include_texts
Logical; whether to include original texts in output (default: TRUE)
- relocate_col
Integer; column position for text column (default: 2)
Value
Data frame with classification scores for each text, plus columns
for original text (if include_texts=TRUE
), error status, and error messages
Details
This function processes multiple texts efficiently by splitting them into batches and optionally sending concurrent requests. It includes robust error handling and progress reporting for large batches.
The function automatically handles request failures with retries and includes error information in the output when requests fail. Original text order is preserved in the results.
The function does not currently handle list(return_all_scores = FALSE)
.
Examples
if (FALSE) { # \dontrun{
texts <- c(
"This product is brilliant!",
"Terrible quality, waste of money",
"Average product, nothing special"
)
results <- hf_classify_batch(
texts = texts,
endpoint_url = "redacted",
key_name = "API_KEY",
batch_size = 3
)
} # }