Create representation model that uses OpenAI text generation models
Source:R/representation.R
bt_representation_openai.Rd
Representative documents are chosen from each topic by sampling (nr_samples) a number of documents from the topic and calculating which of those documents are most representative of the topic by c-tf-idf cosine similarity between the topic and the individual documents. From this the most representative documents (the number is defined by the nr_repr_docs parameter) is extracted and passed to the OpenAI API to generate topic labels based on one of their Completion (chat = FALSE) or ChatCompletion (chat = TRUE) models.
Usage
bt_representation_openai(
fitted_model,
documents,
openai_model = "text-ada-001",
nr_repr_docs = 10,
nr_samples = 500,
chat = FALSE,
api_key = "sk-",
delay_in_seconds = NULL,
prompt = NULL,
diversity = NULL)
Arguments
- fitted_model
Output of bt_fit_model() or another bertopic topic model. The model must have been fitted to data.
- documents
documents used to fit the fitted_model
- openai_model
openai model to use. If using a gpt-3.5 model, set chat = TRUE
- nr_repr_docs
number of representative documents per topic to send to the openai model
- nr_samples
Number of sample documents from which the representative docs are chosen
- chat
set to TRUE if using gpt-3.5 model
- api_key
OpenAI API key is required to use the OpenAI API and can be found on the OpenAI website
- delay_in_seconds
The delay in seconds between consecutive prompts, this is to avoid rate limit errors.
- prompt
The prompt to be used with the openai model. If NULL, the default prompt is used.
- diversity
diversity of documents to be sent to the huggingface model. 0 = no diversity, 1 = max diversity.