Skip to contents

This vignette shows how to embed and classify text with EndpointR using Hugging Face’s inference services.

Setup

library(EndpointR)
library(dplyr)
library(httr2)
library(tibble)

my_data <- tibble(
  id = 1:3,
  text = c(
    "Machine learning is fascinating",
    "I love working with embeddings", 
    "Natural language processing is powerful"
  ),
  category = c("ML", "embeddings", "NLP")
)

Follow Hugging Face’s docs to generate a Hugging Face token, and then register it with EndpointR:

set_api_key("HF_TEST_API_KEY") 

Choosing Your Service

Hugging Face offers two inference options:

  • Inference API: Free, good for testing
  • Dedicated Endpoints: Paid, reliable, fast

For this vignette, we’ll use the Inference API. To switch to dedicated endpoints, just change the URL.

Getting Started

Go to Hugging Face’s models hub and fetch the Inference API’s URL for the model you want to embed your data with. Not all models are available via the Hugging Face Inference API, if you need to use a model that is not available you may need to deploy a Dedicated Inference Endpoint.

Embeddings

Single Text

Embed one piece of text:

# inference api url for embeddings
embed_url <- "https://router.huggingface.co/hf-inference/models/sentence-transformers/all-mpnet-base-v2/pipeline/feature-extraction"

result <- hf_embed_text(
  text = "This is a sample text to embed",
  endpoint_url = embed_url,
  key_name = "HF_API_KEY"
)

The result is a tibble with one row and 384 columns (V1 to V384). Each column is an embedding dimension.

Note: The number of columns depends on your model. Check the model’s Hugging Face page for its embedding size.

List of Texts

Embed multiple texts at once using batching:

texts <- c(
  "First text to embed",
  "Second text to embed",
  "Third text to embed"
)

batch_result <- hf_embed_batch(
  texts,
  endpoint_url = embed_url,
  key_name = "HF_API_KEY",
  batch_size = 3  # process 3 texts per request
)

The result includes:

  • text: your original text
  • .error: TRUE if something went wrong
  • .error_message: what went wrong (if anything)
  • V1 to V384: the embedding values

Data Frame

Most commonly, you’ll want to embed a column in a data frame:

embedding_result <- hf_embed_df(
  df = my_data,
  text_var = text,      # column with your text
  id_var = id,          # column with unique ids
  endpoint_url = embed_url,
  key_name = "HF_API_KEY"
)

Check for errors:

embedding_result |> count(.error)

Extract just the embeddings:

embeddings_only <- embedding_result |> select(V1:V384)

Classification

Classification works the same way as embeddings, just with a different URL and output format. If neceessary, you can also provide a custom function for tidying the output.

Single Text

classify_url <- "https://router.huggingface.co/hf-inference/models/distilbert/distilbert-base-uncased-finetuned-sst-2-english"

sentiment <- hf_classify_text(
  text = "I love this package!",
  endpoint_url = classify_url,
  key_name = "HF_API_KEY"
)

Data Frame

classification_result <- hf_classify_df(
  df = my_data,
  text_var = text,
  id_var = id,
  endpoint_url = classify_url,
  key_name = "HF_API_KEY"
)

The result includes:

  • Your original id column
  • Classification labels (e.g., POSITIVE, NEGATIVE)
  • Confidence scores
  • Error tracking columns.

NOTE: Classification labels are model and task specific.

Using Dedicated Endpoints

To use dedicated endpoints instead of the Inference API:

  1. Deploy your model to a dedicated endpoint (see Hugging Face docs)
  2. Get your endpoint URL
  3. Replace the URL in any function:
# just change this line
dedicated_url <- "https://your-endpoint-name.endpoints.huggingface.cloud"

# everything else stays the same
result <- hf_embed_text(
  text = "Sample text",
  endpoint_url = dedicated_url,  # <- only change
  key_name = "HF_API_KEY"
)

Note: Dedicated endpoints take 20-30 seconds to start if they’re idle. Set max_retries = 5 to give them time to wake up.

Tips

  • Start with small batch sizes (3-5) and increase gradually
  • The Inference API has rate limits - dedicated endpoints have hardware constraints, increase hardware for higher limits
  • For production use, choose dedicated endpoints
  • Check the Improving Performance vignette for speed tips

Common Issues

Rate limits: Reduce batch size or add delays between requests

Model not available: Not all models work with the Inference API. Check the model page or use dedicated endpoints.

Timeouts: Increase max_retries or reduce batch size

Improving Performance

EndpointR’s functions come with knobs and dials and you can turn to improve throughput and performance. Visit the Improving Performance vignette for more information.

Appendix

Comparison of Inference API vs Dedicated Inference Endpoints

Feature Inference API Dedicated Inference Endpoints
Accessibility Public, shared service Private, dedicated hardware
Cost Free (with paid tiers) Paid service - rent specific hardware
Hardware Shared computing resources Dedicated hardware allocation
Wait Times Variable, unknowable in advance Predictable, minimal queuing, ~30s for first request
Production Ready Not recommended for production Recommended for production use
Use Case Casual usage, testing, prototyping Production applications, consistent performance
Scalability Limited by shared resources Scales with dedicated allocation
Availability Subject to shared infrastructure limits Guaranteed availability during rental period
Model Coverage Commonly-used models, models selected by Hugging Face Virtually all models on the Hub are available