playnano.analysis.modules.x_means_clustering module¶

Module for X-Means clustering as part of the playNano analysis pipeline.

This module implements a version of the X-Means clustering algorithm, an extension of K-Means that estimates the optimal number of clusters using the Bayesian Information Criterion (BIC).

Based on: Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. Carnegie Mellon University. http://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf

class playnano.analysis.modules.x_means_clustering.XMeansClusteringModule[source]¶

Bases: AnalysisModule

Cluster features using the X-Means algorithm over (x, y[, t]) coordinates.

This module clusters spatial (and optionally temporal) feature coordinates extracted from an AFM stack using an X-Means algorithm implemented in pure Python.

Parameters:

coord_key (str) – Key in previous_results[detection_module] to find feature list.
coord_columns (Sequence[str]) – Names of feature dictionary keys to use for coordinates (e.g. centroid_x, centroid_y).
use_time (bool) – Whether to append frame timestamps as the third coordinate.
min_k (int) – Initial number of clusters (minimum).
max_k (int) – Maximum number of clusters to allow.
normalise (bool) – Whether to min-max normalize coordinate space before clustering.
time_weight (float, optional) – Multiplier for time dimension (after normalization).

Returns:

dict – Dictionary with clustering results: - clusters: list of {id, frames, point_indices, coords} - cluster_centers: (K, D) ndarray in original units - summary: {n_clusters: int, members_per_cluster: dict}
Version
——-
0.1.0

property name: str¶

Name of the analysis module.

Returns:: The string identifier for this module.
Return type:: str

requires = ['feature_detection', 'log_blob_detection']¶

run(stack: AFMImageStack, previous_results: dict[str, Any] | None = None, *, detection_module: str = 'feature_detection', coord_key: str = 'features_per_frame', coord_columns: Sequence[str] = ('centroid_x', 'centroid_y'), use_time: bool = True, min_k: int = 1, max_k: int = 10, normalise: bool = True, time_weight: float | None = None, replicates: int = 3, max_iter: int = 300, bic_threshold: float = 0.0) → dict[str, Any][source]¶

Perform X-Means clustering on features extracted from an AFM stack.

This method extracts (x, y[, t]) coordinates from detected features, optionally normalizes and time-weights them, and applies the X-Means algorithm to automatically select the number of clusters based on the BIC score.

Parameters:

stack (AFMImageStack) – The input image stack providing frame timing and metadata context.
previous_results (dict[str, Any], optional) – Dictionary containing outputs from previous analysis steps. Must contain the selected detection_module and coord_key.
detection_module (str) – Key identifying which previous modules output to use. Default is “feature_detection”.
coord_key (str) – Key under the detection module that holds per-frame feature dicts. Default is “features_per_frame”.
coord_columns (Sequence[str]) – Keys to extract from each feature for clustering coordinates. If missing, will fall back to using the “centroid” tuple if available. Defaults is (“centroid_x”, “centroid_y”).
use_time (bool) – If True and coord_columns only gives 2D coordinates, appends the frame timestamp as a third dimension. Default is True.
min_k (int) – Initial number of clusters to start with. Default is 1.
max_k (int) – Maximum number of clusters allowed. Defalut is 10.
normalise (bool) – Whether to normalize the feature coordinate axes to the [0, 1] range before clustering. Default is True.
time_weight (float or None, optional) – Multiplicative factor applied to the time axis (after normalization). Used only if time is included as a third coordinate.
replicates (int) – Number of times to run k-means internally to choose the best split. Default is 3.
max_iter (int) – Maximum number of iterations for each k-means call. Default is 300.
bic_threshold (float) – Minimum improvement in BIC required to split a cluster. Default is 0.0 (any improvement allows a split).

Returns:

A dictionary with the following keys:

”clusters”list of dicts, each with:
- id : int
- frames : list of int
- point_indices : list of int
- coords : list of tuple (normalized x, y, [t])
”cluster_centers”ndarray of shape (k, D)
Final cluster centers in original (denormalized) coordinates.
”summary”dict
- ”n_clusters” : int
- ”members_per_cluster” : dict mapping cluster ID to point count.

Return type:

dict

Raises:

RuntimeError – If the required detection module output is missing from previous_results.
KeyError – If the expected coordinate keys are missing from any feature dictionary.

version = '0.1.0'¶

playnano.analysis.modules.x_means_clustering.compute_bic(points: ndarray, center: ndarray) → float[source]¶

Compute Bayesian Information Criterion for a cluster.

Parameters:

points (np.ndarray) – Points in the cluster.
center (np.ndarray) – Cluster center (shape (1, D)).

Returns:

BIC value.

Return type:

float

playnano.analysis.modules.x_means_clustering.core_xmeans(data: ndarray, init_k: int, max_k: int, min_cluster_size: int, distance: str, replicates: int, max_iter: int, bic_threshold: float) → tuple[ndarray, ndarray][source]¶

Core X-Means loop.

Parameters are equivalent to those in run above.

playnano.analysis.modules.x_means_clustering.initialize_centers(points: ndarray, k: int) → ndarray[source]¶: Initialize k centers using a k-means++-like heuristic.