playnano.analysis.modules.x_means_clustering module¶
Module for X-Means clustering as part of the playNano analysis pipeline.
This module implements a version of the X-Means clustering algorithm, an extension of K-Means that estimates the optimal number of clusters using the Bayesian Information Criterion (BIC).
Based on: Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. Carnegie Mellon University. http://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf
- class playnano.analysis.modules.x_means_clustering.XMeansClusteringModule[source]¶
Bases:
AnalysisModuleCluster features using the X-Means algorithm over (x, y[, t]) coordinates.
This module clusters spatial (and optionally temporal) feature coordinates extracted from an AFM stack using an X-Means algorithm implemented in pure Python.
- Parameters:
coord_key (str) – Key in previous_results[detection_module] to find feature list.
coord_columns (Sequence[str]) – Names of feature dictionary keys to use for coordinates (e.g. centroid_x, centroid_y).
use_time (bool) – Whether to append frame timestamps as the third coordinate.
min_k (int) – Initial number of clusters (minimum).
max_k (int) – Maximum number of clusters to allow.
normalise (bool) – Whether to min-max normalize coordinate space before clustering.
time_weight (float, optional) – Multiplier for time dimension (after normalization).
- Returns:
dict – Dictionary with clustering results: - clusters: list of {id, frames, point_indices, coords} - cluster_centers: (K, D) ndarray in original units - summary: {n_clusters: int, members_per_cluster: dict}
Version
——-
0.1.0
- property name: str¶
Name of the analysis module.
- Returns:
The string identifier for this module.
- Return type:
- requires = ['feature_detection', 'log_blob_detection']¶
- run(stack: AFMImageStack, previous_results: dict[str, Any] | None = None, *, detection_module: str = 'feature_detection', coord_key: str = 'features_per_frame', coord_columns: Sequence[str] = ('centroid_x', 'centroid_y'), use_time: bool = True, min_k: int = 1, max_k: int = 10, normalise: bool = True, time_weight: float | None = None, replicates: int = 3, max_iter: int = 300, bic_threshold: float = 0.0) dict[str, Any][source]¶
Perform X-Means clustering on features extracted from an AFM stack.
This method extracts (x, y[, t]) coordinates from detected features, optionally normalizes and time-weights them, and applies the X-Means algorithm to automatically select the number of clusters based on the BIC score.
- Parameters:
stack (AFMImageStack) – The input image stack providing frame timing and metadata context.
previous_results (dict[str, Any], optional) – Dictionary containing outputs from previous analysis steps. Must contain the selected detection_module and coord_key.
detection_module (str) – Key identifying which previous modules output to use. Default is “feature_detection”.
coord_key (str) – Key under the detection module that holds per-frame feature dicts. Default is “features_per_frame”.
coord_columns (Sequence[str]) – Keys to extract from each feature for clustering coordinates. If missing, will fall back to using the “centroid” tuple if available. Defaults is (“centroid_x”, “centroid_y”).
use_time (bool) – If True and coord_columns only gives 2D coordinates, appends the frame timestamp as a third dimension. Default is True.
min_k (int) – Initial number of clusters to start with. Default is 1.
max_k (int) – Maximum number of clusters allowed. Defalut is 10.
normalise (bool) – Whether to normalize the feature coordinate axes to the [0, 1] range before clustering. Default is True.
time_weight (float or None, optional) – Multiplicative factor applied to the time axis (after normalization). Used only if time is included as a third coordinate.
replicates (int) – Number of times to run k-means internally to choose the best split. Default is 3.
max_iter (int) – Maximum number of iterations for each k-means call. Default is 300.
bic_threshold (float) – Minimum improvement in BIC required to split a cluster. Default is 0.0 (any improvement allows a split).
- Returns:
A dictionary with the following keys:
- ”clusters”list of dicts, each with:
id : int
frames : list of int
point_indices : list of int
coords : list of tuple (normalized x, y, [t])
- ”cluster_centers”ndarray of shape (k, D)
Final cluster centers in original (denormalized) coordinates.
- ”summary”dict
”n_clusters” : int
”members_per_cluster” : dict mapping cluster ID to point count.
- Return type:
- Raises:
RuntimeError – If the required detection module output is missing from previous_results.
KeyError – If the expected coordinate keys are missing from any feature dictionary.
- version = '0.1.0'¶
- playnano.analysis.modules.x_means_clustering.compute_bic(points: ndarray, center: ndarray) float[source]¶
Compute Bayesian Information Criterion for a cluster.
- Parameters:
points (np.ndarray) – Points in the cluster.
center (np.ndarray) – Cluster center (shape (1, D)).
- Returns:
BIC value.
- Return type: