Skip to contents

Classifies a batch of texts using a Hugging Face classification endpoint and returns classification scores in a tidy format. Handles batching, concurrent requests, and error recovery automatically.

Usage

hf_classify_batch(
  texts,
  endpoint_url,
  key_name,
  ...,
  tidy_func = tidy_batch_classification_response,
  parameters = list(return_all_scores = TRUE),
  batch_size = 8,
  progress = TRUE,
  concurrent_requests = 5,
  max_retries = 5,
  timeout = 30,
  include_texts = TRUE,
  relocate_col = 2
)

Arguments

texts

Character vector of texts to classify

endpoint_url

URL of the Hugging Face Inference API endpoint

key_name

Name of environment variable containing the API key

...

Additional arguments passed to request functions

tidy_func

Function to process API responses, defaults to tidy_batch_classification_response

parameters

List of parameters for the API endpoint, defaults to list(return_all_scores = TRUE)

batch_size

Integer; number of texts per batch (default: 8)

progress

Logical; whether to show progress bar (default: TRUE)

concurrent_requests

Integer; number of concurrent requests (default: 5)

max_retries

Integer; maximum retry attempts (default: 5)

timeout

Numeric; request timeout in seconds (default: 20)

include_texts

Logical; whether to include original texts in output (default: TRUE)

relocate_col

Integer; column position for text column (default: 2)

Value

Data frame with classification scores for each text, plus columns for original text (if include_texts=TRUE), error status, and error messages

Details

This function processes multiple texts efficiently by splitting them into batches and optionally sending concurrent requests. It includes robust error handling and progress reporting for large batches.

The function automatically handles request failures with retries and includes error information in the output when requests fail. Original text order is preserved in the results.

The function does not currently handle list(return_all_scores = FALSE).

Examples

if (FALSE) { # \dontrun{
  texts <- c(
    "This product is brilliant!",
    "Terrible quality, waste of money",
    "Average product, nothing special"
  )

  results <- hf_classify_batch(
    texts = texts,
    endpoint_url = "redacted",
    key_name = "API_KEY",
    batch_size = 3
  )
} # }