Classify a data frame of texts using Hugging Face Inference Endpoints
Source:R/hf_classify.R
hf_classify_df.Rd
Classifies texts in a data frame column using a Hugging Face classification endpoint and joins the results back to the original data frame.
Usage
hf_classify_df(
df,
text_var,
id_var,
endpoint_url,
key_name,
...,
tidy_func = tidy_batch_classification_response,
parameters = list(return_all_scores = TRUE),
batch_size = 4,
concurrent_requests = 1,
max_retries = 5,
timeout = 30,
progress = TRUE
)
Arguments
- df
Data frame containing texts to classify
- text_var
Column name containing texts to classify (unquoted)
- id_var
Column name to use as identifier for joining (unquoted)
- endpoint_url
URL of the Hugging Face Inference API endpoint
- key_name
Name of environment variable containing the API key
- ...
Additional arguments passed to request functions
- tidy_func
Function to process API responses, defaults to
tidy_batch_classification_response
- parameters
List of parameters for the API endpoint, defaults to
list(return_all_scores = TRUE)
- batch_size
Integer; number of texts per batch (default: 4)
- concurrent_requests
Integer; number of concurrent requests (default: 1)
- max_retries
Integer; maximum retry attempts (default: 5)
- timeout
Numeric; request timeout in seconds (default: 30)
- progress
Logical; whether to show progress bar (default: TRUE)
Value
Original data frame with additional columns for classification scores, or classification results table if row counts don't match
Details
This function extracts texts from a specified column, classifies them using
hf_classify_batch()
, and joins the classification results back to the
original data frame using a specified ID column.
The function preserves the original data frame structure and adds new columns for classification scores. If the number of rows doesn't match after processing (due to errors), it returns the classification results separately with a warning.
The function does not currently handle list(return_all_scores = FALSE)
.
Examples
if (FALSE) { # \dontrun{
df <- data.frame(
id = 1:3,
review = c("Excellent service", "Poor quality", "Average experience")
)
classified_df <- hf_classify_df(
df = df,
text_var = review,
id_var = id,
endpoint_url = "redacted",
key_name = "API_KEY"
)
} # }