Skip to contents

Instantiates an HDBSCAN clustering model using the hdbscan Python library.

Usage

bt_make_clusterer_hdbscan(
  ...,
  min_cluster_size = 10L,
  min_samples = 10L,
  metric = "euclidean",
  cluster_selection_method = c("eom", "leaf"),
  prediction_data = FALSE
)

Arguments

...

Additional arguments sent to hdbscan.HDBSCAN()

min_cluster_size

Minimum number of data points for each cluster, enter as integer by adding L to number

min_samples

Controls the number of outliers generated, lower value = fewer outliers.

metric

Distance metric to calculate clusters with

cluster_selection_method

The method used to select clusters. Default is "eom".

prediction_data

Set to TRUE if you intend on using model with any functions from hdbscan.prediction eg. if using bt_outliers_probabilities

Value

An instance of the HDBSCAN clustering model (Python object.

Examples

# using minkowski metric for calculating distance between documents - when using minkowski metric, a value for p must be specified as an additional argument
clustering_model <- bt_make_clusterer_hdbscan(metric = "minkowski", p = 1.5)

# specify integer numeric inputs as integer, using additional gen_min_span_tree argument
clusterer = bt_make_clusterer_hdbscan(min_cluster_size = 5L, gen_min_span_tree = TRUE)

# not specifying numeric inputs as integers (converted to integers internally)
clusterer = bt_make_clusterer_hdbscan(min_cluster_size = 5, cluster_selection_method = "leaf")