Analysis¶

The analysis system in playNano provides a provenance-aware pipeline for running a variety of analaysis modules including built in modules for feature detection and particle tracking on AFM image stacks. Analysis steps produce structured results (counts, tables, tracks, summaries) that are stored on the AFM stack and recorded with full provenance for audit and reproducibility.

This page covers: - quick CLI and Python examples - YAML schema for pipeline files - how results are saved and inspected - advanced implementation details (module loading, exact provenance fields) - extending the system with custom modules

Quick start¶

Run an inline analysis pipeline from the CLI:

playnano analyze data/processed_sample.h5 --analysis-steps "feature_detection:mask_fn="mask_threshold",threshold=1;particle_tracking:max_distance=3"      --output-folder ./results      --output-name tracked_particles

Or load the pipeline from a YAML/JSON file:

analysis:
  - name: feature_detection
    mask_fn: mask_threshold
    threshold: 5
  - name: particle_tracking
    max_distance: 3

playnano analyze data/processed_sample.h5 --analysis-file my_pipeline.yaml

Overview & behaviour¶

Analysis pipelines are conceptually similar to processing pipelines but operate on derived results rather than on image arrays.
Analysis adds results into stack.analysis (it does not replace stack.data).
Each analysis step is executed by an AnalysisModule (built-in or plugin).
The pipeline records per-step provenance (parameters, timestamps, version and stored result keys) under stack.provenance['analysis'].
If log_to is provided to programmatic runs or the CLI, a sanitised JSON summary is written (arrays and complex objects are summarised for JSON friendliness).

Available analysis modules¶

See the generated list of installed modules:

count_nonzero - Analysis module for counting non-zero data points in an array.
dbscan_clustering - DBSCAN clustering on features over the entire stack in 3D (x, y, time).
feature_detection - Module for threshold based feature detection.
k_means_clustering - K-Means clustering on features over the entire stack in 3D (x, y, time).
log_blob_detection - Module for LoG blob detection.
particle_tracking - Module for linking particle features across frames to build trajectories.
x_means_clustering - Module for X-Means clustering as part of the playNano analysis pipeline.

CLI usage¶

The analysis pipeline can be run from the pipeline or programmatically but not currently from the GUI.

This is the general form of the analyze subcommand for CLI use:

playnano analyze <input_file>      (--analysis-steps "step1:arg=val;step2:arg=val" | --analysis-file pipeline.yaml)      [--channel CHANNEL]      [--output-folder OUTPUT_DIR]      [--output-name BASE_NAME]

Common options:

--analysis-steps - semicolon-delimited inline pipeline string.
--analysis-file - YAML or JSON file describing the pipeline (mutually exclusive with inline).
--channel - channel to read (default: height_trace).
--output-folder / --output-name - control where exported results are written.

Since some analysis modules have several parameters it is often easier to generate a YAML using the wizard.

YAML schema¶

Top-level key must be analysis:

analysis:
  - name: feature_detection
    mask_fn: mask_threshold
    threshold: 4.5
  - name: particle_tracking
    max_distance: 2.5
    min_length: 5

Each entry:

name (str, required) - analysis module name.
parameters - module-specific kwargs passed through to the module.

Validation notes¶

The analysis key is required.
Each step must include name.
Unknown module names raise an error at runtime.
Parameters are forwarded as keyword arguments and must match the module signature.

Programmatic usage¶

Construct and run a pipeline from Python:

from playNano.afm_stack import AFMImageStack
from playNano.analysis.pipeline import AnalysisPipeline
import yaml

stack = AFMImageStack.load_afm_stack("data/processed_sample.h5", channel="height_trace")

ap = AnalysisPipeline()
ap.add("detect_particles", threshold=5)
ap.add("track_particles", max_distance=3.0)

# run and optionally write a sanitised JSON log
record = ap.run(stack, log_to="analysis.json")

# programmatic access
print(list(stack.analysis.keys()))
print(stack.provenance["analysis"]["steps"])

Outputs & exports¶

Results are saved on the AFM stack instance:
- stack.analysis : dict of stored analysis results (keys use a step-based naming scheme).
- stack.provenance["analysis"] : detailed provenance about the run.
- stack.provenance["environment"] : runtime environment metadata (OS, Python, packages).
CLI/utility functions may optionally export: - A sanitised JSON summary (human- and machine-readable). - HDF5 bundle with full data + provenance.

Inspecting results¶

Common programmatic patterns:

# 1) List all stored analysis keys
print(sorted(stack.analysis.keys()))

# 2) Walk step provenance
for step in stack.provenance["analysis"]["steps"]:
    print(step["index"], step["name"], step["analysis_key"], step.get("version"))

# 3) Access outputs for a named module
for rec in stack.provenance["analysis"]["results_by_name"].get("detect_particles", []):
    key = rec["analysis_key"]
    outputs = rec["outputs"]
    # use outputs (dict/list/array) directly

Advanced / Implementation details¶

Module loading¶

Modules are resolved first from the built-in registry (playNano.analysis.BUILTIN_ANALYSIS_MODULES), then via Python entry points in the playNano.analysis group. The first matching entry point is used.
Loaded classes are instantiated and must subclass AnalysisModule.
Instantiated modules are cached on the pipeline instance to avoid repeated re-instantiation.

Result storage layout¶

Analysis result keys follow the pattern:
```
step_<idx>_<module_name>
```
where idx is 1-based and spaces are replaced by underscores. Each key in stack.analysis maps to the raw outputs returned by the module (could be a dict, array, DataFrame, etc.).

Provenance structure¶

After a run stack.provenance["analysis"] contains:

steps - ordered list of per-step records. Each record includes:
- index: 1-based integer
- name: module name as invoked
- params: parameters passed to the module (keyword args)
- timestamp: ISO-8601 UTC timestamp of execution
- version: optional version string (if module specifies it)
- analysis_key: key used to store the outputs in stack.analysis
results_by_name - mapping from module name to a list of results produced by that module during the run. Each entry in the list is a dict with:
- analysis_key - stored key in stack.analysis
- outputs - the raw outputs object stored under that key
frame_times - result of stack.get_frame_times() (if present), else None.

Notes¶

The pipeline will create stack.provenance and stack.analysis if they do not exist.
stack.provenance["environment"] is set if not already present (gathered via the system info util).
When log_to is supplied, the pipeline writes a sanitised JSON summary using playNano.analysis.utils.sanitize_analysis_for_logging() (this is intended to produce small, JSON-friendly summaries suitable for logs).

Troubleshooting & tips¶

If a module name fails to resolve, ensure the module is listed in the built-in registry or that a plugin exposing the correct entry point is installed.
For large or complex outputs prefer HDF5 export - sanitised JSON may truncate or summarise arrays.
If analyses expect processed frames, run a processing pipeline first (see Processing).

Extending with Custom Modules¶

You can create your own analysis modules and register them as plugins. See Custom Analysis Modules for full details, including requirements, examples, and best practices.