Use Huggingface models to create topic representation
Source:R/representation.R
bt_representation_hf.Rd
Use Huggingface models to create topic representation
Usage
bt_representation_hf(
fitted_model,
documents,
task,
hf_model,
...,
default_prompt = "keywords",
nr_samples = 500,
nr_repr_docs = 20,
diversity = 10,
custom_prompt = NULL
)
Arguments
- fitted_model
The fitted bertopic model
- documents
the documents the topic model was fitted to
- task
Task defining the pipeline that will be returned. See https://huggingface.co/transformers/v3.0.2/main_classes/pipelines.html for more information. Use "text-generation" for gpt-like models and "text2text-generation" for T5-like models
- hf_model
The model that will be used by the pipeline to make predictions
- ...
arguments sent to the transformers.pipeline function
- default_prompt
Whether to use the "keywords" or "documents" default prompt. Passing a custom_prompt will render this argument NULL. Default is "keywords" prompt.
- nr_samples
Number of sample documents from which the representative docs are chosen
- nr_repr_docs
Number of representative documents to be sent to the huggingface model
- diversity
diversity of documents to be sent to the huggingface model. 0 = no diversity, 1 = max diversity.
- custom_prompt
The custom prompt to be used in the pipeline. If not specified, the "keywords" or "documents" default_prompt will be used. Use "[KEYWORDS]" and "[DOCUMENTS]" in the prompt to decide where the keywords and documents are inserted.
Details
Representative documents are chosen from each topic by sampling (nr_samples) a number of documents from the topic and calculating which of those documents are most representative of the topic by c-tf-idf cosine similarity between the topic and the individual documents. From this the most representative documents (the number is defined by the nr_repr_docs parameter) is extracted and passed to the huggingface model and topic description predicted.