Create representation model that uses OpenAI text generation models

Representative documents are chosen from each topic by sampling (nr_samples) a number of documents from the topic and calculating which of those documents are most representative of the topic by c-tf-idf cosine similarity between the topic and the individual documents. From this the most representative documents (the number is defined by the nr_repr_docs parameter) is extracted and passed to the OpenAI API to generate topic labels based on one of their Completion (chat = FALSE) or ChatCompletion (chat = TRUE) models.

Usage

bt_representation_openai(
fitted_model,
documents,
openai_model = "gpt-4o-mini",
nr_repr_docs = 10,
nr_samples = 500,
chat = TRUE,
api_key = "sk-",
delay_in_seconds = NULL,
prompt = NULL,
diversity = NULL)

Arguments

fitted_model: Output of bt_fit_model() or another bertopic topic model. The model must have been fitted to data.
documents: documents used to fit the fitted_model
openai_model: openai model to use. If using a chat model, set chat = TRUE
nr_repr_docs: number of representative documents per topic to send to the openai model
nr_samples: Number of sample documents from which the representative docs are chosen
chat: set to TRUE if using gpt-4o-mini model
api_key: OpenAI API key is required to use the OpenAI API and can be found on the OpenAI website
delay_in_seconds: The delay in seconds between consecutive prompts, this is to avoid rate limit errors.
prompt: The prompt to be used with the openai model. If NULL, the default prompt is used.
diversity: diversity of documents to be sent to the huggingface model. 0 = no diversity, 1 = max diversity.

Value

OpenAI representation model