Skip to contents

Classifies texts in a data frame column using a Hugging Face classification endpoint and joins the results back to the original data frame.

Usage

hf_classify_df(
  df,
  text_var,
  id_var,
  endpoint_url,
  key_name,
  ...,
  tidy_func = tidy_batch_classification_response,
  parameters = list(return_all_scores = TRUE),
  batch_size = 4,
  concurrent_requests = 1,
  max_retries = 5,
  timeout = 30,
  progress = TRUE
)

Arguments

df

Data frame containing texts to classify

text_var

Column name containing texts to classify (unquoted)

id_var

Column name to use as identifier for joining (unquoted)

endpoint_url

URL of the Hugging Face Inference API endpoint

key_name

Name of environment variable containing the API key

...

Additional arguments passed to request functions

tidy_func

Function to process API responses, defaults to tidy_batch_classification_response

parameters

List of parameters for the API endpoint, defaults to list(return_all_scores = TRUE)

batch_size

Integer; number of texts per batch (default: 4)

concurrent_requests

Integer; number of concurrent requests (default: 1)

max_retries

Integer; maximum retry attempts (default: 5)

timeout

Numeric; request timeout in seconds (default: 30)

progress

Logical; whether to show progress bar (default: TRUE)

Value

Original data frame with additional columns for classification scores, or classification results table if row counts don't match

Details

This function extracts texts from a specified column, classifies them using hf_classify_batch(), and joins the classification results back to the original data frame using a specified ID column.

The function preserves the original data frame structure and adds new columns for classification scores. If the number of rows doesn't match after processing (due to errors), it returns the classification results separately with a warning.

The function does not currently handle list(return_all_scores = FALSE).

Examples

if (FALSE) { # \dontrun{
  df <- data.frame(
    id = 1:3,
    review = c("Excellent service", "Poor quality", "Average experience")
  )

  classified_df <- hf_classify_df(
    df = df,
    text_var = review,
    id_var = id,
    endpoint_url = "redacted",
    key_name = "API_KEY"
  )
} # }