Classify a data frame of texts using Hugging Face Inference Endpoints

Classifies texts in a data frame column using a Hugging Face classification endpoint and joins the results back to the original data frame.

Usage

hf_classify_df(
  df,
  text_var,
  id_var,
  endpoint_url,
  key_name,
  ...,
  tidy_func = tidy_batch_classification_response,
  parameters = list(return_all_scores = TRUE),
  batch_size = 4,
  concurrent_requests = 1,
  max_retries = 5,
  timeout = 30,
  progress = TRUE
)

Arguments

df: Data frame containing texts to classify
text_var: Column name containing texts to classify (unquoted)
id_var: Column name to use as identifier for joining (unquoted)
endpoint_url: URL of the Hugging Face Inference API endpoint
key_name: Name of environment variable containing the API key
...: Additional arguments passed to request functions
tidy_func: Function to process API responses, defaults to tidy_batch_classification_response
parameters: List of parameters for the API endpoint, defaults to list(return_all_scores = TRUE)
batch_size: Integer; number of texts per batch (default: 4)
concurrent_requests: Integer; number of concurrent requests (default: 1)
max_retries: Integer; maximum retry attempts (default: 5)
timeout: Numeric; request timeout in seconds (default: 30)
progress: Logical; whether to show progress bar (default: TRUE)

Value

Original data frame with additional columns for classification scores, or classification results table if row counts don't match

Details

This function extracts texts from a specified column, classifies them using hf_classify_batch(), and joins the classification results back to the original data frame using a specified ID column.

The function preserves the original data frame structure and adds new columns for classification scores. If the number of rows doesn't match after processing (due to errors), it returns the classification results separately with a warning.

The function does not currently handle list(return_all_scores = FALSE).

Examples

if (FALSE) { # \dontrun{
  df <- data.frame(
    id = 1:3,
    review = c("Excellent service", "Poor quality", "Average experience")
  )

  classified_df <- hf_classify_df(
    df = df,
    text_var = review,
    id_var = id,
    endpoint_url = "redacted",
    key_name = "API_KEY"
  )
} # }