Create an HDBSCAN clustering model

Instantiates an HDBSCAN clustering model using the hdbscan Python library.

Usage

bt_make_clusterer_hdbscan(
  ...,
  min_cluster_size = 10L,
  min_samples = 10L,
  metric = "euclidean",
  cluster_selection_method = c("eom", "leaf"),
  prediction_data = FALSE
)

Arguments

...: Additional arguments sent to hdbscan.HDBSCAN()
min_cluster_size: Minimum number of data points for each cluster, enter as integer by adding L to number
min_samples: Controls the number of outliers generated, lower value = fewer outliers.
metric: Distance metric to calculate clusters with
cluster_selection_method: The method used to select clusters. Default is "eom".
prediction_data: Set to TRUE if you intend on using model with any functions from hdbscan.prediction eg. if using bt_outliers_probabilities

Value

An instance of the HDBSCAN clustering model (Python object.

Examples

# using minkowski metric for calculating distance between documents - when using minkowski metric, a value for p must be specified as an additional argument
clustering_model <- bt_make_clusterer_hdbscan(metric = "minkowski", p = 1.5)

# specify integer numeric inputs as integer, using additional gen_min_span_tree argument
clusterer = bt_make_clusterer_hdbscan(min_cluster_size = 5L, gen_min_span_tree = TRUE)

# not specifying numeric inputs as integers (converted to integers internally)
clusterer = bt_make_clusterer_hdbscan(min_cluster_size = 5, cluster_selection_method = "leaf")

Usage

Arguments

Value

See also

Examples