playnano.analysis.modules.x_means_clustering module

Module for X-Means clustering as part of the playNano analysis pipeline.

This module implements a version of the X-Means clustering algorithm, an extension of K-Means that estimates the optimal number of clusters using the Bayesian Information Criterion (BIC).

Based on: Pelleg, D., & Moore, A. W. (2000). X-means: Extending K-means with Efficient Estimation of the Number of Clusters. Carnegie Mellon University. http://www.cs.cmu.edu/~dpelleg/download/xmeans.pdf

class playnano.analysis.modules.x_means_clustering.XMeansClusteringModule[source]

Bases: AnalysisModule

Cluster features using the X-Means algorithm over (x, y[, t]) coordinates.

This module clusters spatial (and optionally temporal) feature coordinates extracted from an AFM stack using an X-Means algorithm implemented in pure Python.

Parameters:
  • coord_key (str) – Key in previous_results[detection_module] to find feature list.

  • coord_columns (Sequence[str]) – Names of feature dictionary keys to use for coordinates (e.g. centroid_x, centroid_y).

  • use_time (bool) – Whether to append frame timestamps as the third coordinate.

  • min_k (int) – Initial number of clusters (minimum).

  • max_k (int) – Maximum number of clusters to allow.

  • normalise (bool) – Whether to min-max normalize coordinate space before clustering.

  • time_weight (float, optional) – Multiplier for time dimension (after normalization).

Returns:

  • dict – Dictionary with clustering results: - clusters: list of {id, frames, point_indices, coords} - cluster_centers: (K, D) ndarray in original units - summary: {n_clusters: int, members_per_cluster: dict}

  • Version

  • ——-

  • 0.1.0

property name: str

Name of the analysis module.

Returns:

The string identifier for this module.

Return type:

str

requires = ['feature_detection', 'log_blob_detection']
run(stack: AFMImageStack, previous_results: dict[str, Any] | None = None, *, detection_module: str = 'feature_detection', coord_key: str = 'features_per_frame', coord_columns: Sequence[str] = ('centroid_x', 'centroid_y'), use_time: bool = True, min_k: int = 1, max_k: int = 10, normalise: bool = True, time_weight: float | None = None, replicates: int = 3, max_iter: int = 300, bic_threshold: float = 0.0) dict[str, Any][source]

Perform X-Means clustering on features extracted from an AFM stack.

This method extracts (x, y[, t]) coordinates from detected features, optionally normalizes and time-weights them, and applies the X-Means algorithm to automatically select the number of clusters based on the BIC score.

Parameters:
  • stack (AFMImageStack) – The input image stack providing frame timing and metadata context.

  • previous_results (dict[str, Any], optional) – Dictionary containing outputs from previous analysis steps. Must contain the selected detection_module and coord_key.

  • detection_module (str) – Key identifying which previous modules output to use. Default is “feature_detection”.

  • coord_key (str) – Key under the detection module that holds per-frame feature dicts. Default is “features_per_frame”.

  • coord_columns (Sequence[str]) – Keys to extract from each feature for clustering coordinates. If missing, will fall back to using the “centroid” tuple if available. Defaults is (“centroid_x”, “centroid_y”).

  • use_time (bool) – If True and coord_columns only gives 2D coordinates, appends the frame timestamp as a third dimension. Default is True.

  • min_k (int) – Initial number of clusters to start with. Default is 1.

  • max_k (int) – Maximum number of clusters allowed. Defalut is 10.

  • normalise (bool) – Whether to normalize the feature coordinate axes to the [0, 1] range before clustering. Default is True.

  • time_weight (float or None, optional) – Multiplicative factor applied to the time axis (after normalization). Used only if time is included as a third coordinate.

  • replicates (int) – Number of times to run k-means internally to choose the best split. Default is 3.

  • max_iter (int) – Maximum number of iterations for each k-means call. Default is 300.

  • bic_threshold (float) – Minimum improvement in BIC required to split a cluster. Default is 0.0 (any improvement allows a split).

Returns:

A dictionary with the following keys:

  • ”clusters”list of dicts, each with:
    • id : int

    • frames : list of int

    • point_indices : list of int

    • coords : list of tuple (normalized x, y, [t])

  • ”cluster_centers”ndarray of shape (k, D)

    Final cluster centers in original (denormalized) coordinates.

  • ”summary”dict
    • ”n_clusters” : int

    • ”members_per_cluster” : dict mapping cluster ID to point count.

Return type:

dict

Raises:
  • RuntimeError – If the required detection module output is missing from previous_results.

  • KeyError – If the expected coordinate keys are missing from any feature dictionary.

version = '0.1.0'
playnano.analysis.modules.x_means_clustering.compute_bic(points: ndarray, center: ndarray) float[source]

Compute Bayesian Information Criterion for a cluster.

Parameters:
  • points (np.ndarray) – Points in the cluster.

  • center (np.ndarray) – Cluster center (shape (1, D)).

Returns:

BIC value.

Return type:

float

playnano.analysis.modules.x_means_clustering.core_xmeans(data: ndarray, init_k: int, max_k: int, min_cluster_size: int, distance: str, replicates: int, max_iter: int, bic_threshold: float) tuple[ndarray, ndarray][source]

Core X-Means loop.

Parameters are equivalent to those in run above.

playnano.analysis.modules.x_means_clustering.initialize_centers(points: ndarray, k: int) ndarray[source]

Initialize k centers using a k-means++-like heuristic.