Skip to contents

This function wraps the UMAP functionality from Python's umap-learn package for use in R via reticulate. It allows you to perform dimension reduction on high-dimensional data, its intended use is in a BertopicR pipeline/

Usage

bt_make_reducer_umap(
  ...,
  n_neighbours = 15L,
  n_components = 5L,
  min_dist = 0,
  metric = "euclidean",
  random_state = 42L,
  low_memory = FALSE,
  verbose = TRUE
)

Arguments

...

Sent to umap.UMAP python function for adding additional arguments

n_neighbours

The size of local neighbourhood (in terms of number of neighboring data points) used for manifold approximation (default: 15).

n_components

The number of dimensions to reduce to (default: 5).

min_dist

The minimum distance between points in the low-dimensional representation (default: 0.0).

metric

The metric to use for distance computation (default: "euclidean").

random_state

The seed used by the random number generator (default: 42).

low_memory

Loogical, use a low memory version of UMAP (default: FALSE)

verbose

Logical flag indicating whether to report progress during the dimension reduction (default: TRUE).

Value

A UMAP Model that can be input to bt_do_reducing to reduce dimensions of data

Details

If you're concerned about processing time, you most likely will only want to reduce the dimensions of your dataset once. In this case, when compiling your model with bt_compile_model you should call reducer <- bt_empty_reducer().

low_memory = TRUE is currently inadvisable as trial and error suggests the results are not as robust in later clustering.

Examples

# using euclidean distance measure and specifying numeric inputs as integers
reducer <- bt_make_reducer_umap(n_neighbours = 15L, n_components = 10L, metric = "euclidean")

# using euclidean distance measure and not specifying numeric inputs as integers (done internally in function)
reducer <- bt_make_reducer_umap(n_neighbours = 15, n_components = 10, metric = "euclidean")

 # using cosine distance measure and not specifying numeric inputs as integers (done internally in function)
reducer <- bt_make_reducer_umap(n_neighbours = 20, n_components = 6, metric = "cosine")