pyXenium.mechanostress Atera/PDC tutorial#

This tutorial runs the canonical pyXenium.mechanostress workflow on the Atera WTA FFPE breast cancer Xenium export. The public ReadTheDocs page shows a saved snapshot from the full-data run; the same notebook can be rerun on PDC with the scripts in benchmarking/mechanostress_pdc/.

The biological unit here is not a contour. We reuse the S1/S5 panel from the contour tutorial to define an interpretable tumor-DCIS/stroma mechanical context: S1 invasive tumor cells plus S5 apocrine/luminal DCIS are treated as tumor, S1 invasive CAFs are treated as stromal, and the main workflow focuses on S1 CAF axis strength. A separate comparison below contrasts the available CAF/fibroblast pools (S1_CAF, S4_CAF, and S3_CXCL14_fibroblast); the Atera annotation does not contain a literal S5 CAF group.

from __future__ import annotations

import json
import os
from pathlib import Path

import matplotlib.pyplot as plt
import pandas as pd
from IPython.display import Image, Markdown, display

import pyXenium
from pyXenium.io import read_xenium
from pyXenium.mechanostress import (
    AxisStrengthConfig,
    MechanostressConfig,
    PolarityConfig,
    TumorStromaGrowthConfig,
    classify_tumor_stroma_growth,
    compute_ane_density,
    compute_cell_polarity,
    compute_distance_expression_coupling,
    estimate_cell_axes,
    run_mechanostress_cohort,
    run_mechanostress_workflow,
    summarize_axial_orientation,
    summarize_cell_polarity,
)

print("pyXenium", pyXenium.__version__)

pyXenium 0.4.1

Dataset and execution paths#

Default PDC input path:

/cfs/klemming/projects/supr/naiss2025-22-606/data/WTA_Preview_FFPE_Breast_Cancer_outs

Default local mirror used for this saved snapshot:

Y:\long\10X_datasets\Xenium\Atera\WTA_Preview_FFPE_Breast_Cancer_outs

ReadTheDocs does not execute this notebook. To rerun it on PDC, stage the repo and required small Xenium artifacts, bootstrap the conda env, then submit the Slurm notebook job shown near the end of the tutorial.

LOCAL_DATASET_ROOT = Path(r"Y:\long\10X_datasets\Xenium\Atera\WTA_Preview_FFPE_Breast_Cancer_outs")
PDC_DATASET_ROOT = Path("/cfs/klemming/projects/supr/naiss2025-22-606/data/WTA_Preview_FFPE_Breast_Cancer_outs")

DATASET_ROOT = Path(os.environ.get(
    "PYXENIUM_MECHANOSTRESS_DATASET_ROOT",
    LOCAL_DATASET_ROOT if LOCAL_DATASET_ROOT.exists() else PDC_DATASET_ROOT,
))
COHORT_ROOT = Path(os.environ.get("PYXENIUM_MECHANOSTRESS_COHORT_ROOT", DATASET_ROOT.parent))
SAMPLE_GLOB = os.environ.get("PYXENIUM_MECHANOSTRESS_SAMPLE_GLOB", DATASET_ROOT.name)
OUTPUT_ROOT = Path(os.environ.get("PYXENIUM_MECHANOSTRESS_OUTPUT_ROOT", DATASET_ROOT / "pyxenium_tutorial_outputs" / "mechanostress_atera_pdc"))
DEFAULT_STATIC_CANDIDATES = [
    Path("docs") / "_static" / "tutorials" / "mechanostress_atera_pdc",
    Path("..") / "_static" / "tutorials" / "mechanostress_atera_pdc",
]
DEFAULT_STATIC_DIR = next((path for path in DEFAULT_STATIC_CANDIDATES if path.exists()), DEFAULT_STATIC_CANDIDATES[0])
STATIC_DIR = Path(os.environ.get("PYXENIUM_MECHANOSTRESS_STATIC_DIR", DEFAULT_STATIC_DIR))
RUN_FULL_WORKFLOW = os.environ.get("PYXENIUM_MECHANOSTRESS_RUN_FULL", "0") == "1"

print("DATASET_ROOT =", DATASET_ROOT)
print("COHORT_ROOT =", COHORT_ROOT)
print("SAMPLE_GLOB =", SAMPLE_GLOB)
print("OUTPUT_ROOT =", OUTPUT_ROOT)
print("STATIC_DIR =", STATIC_DIR)
print("RUN_FULL_WORKFLOW =", RUN_FULL_WORKFLOW)

DATASET_ROOT = /cfs/klemming/projects/supr/naiss2025-22-606/data/WTA_Preview_FFPE_Breast_Cancer_outs
COHORT_ROOT = /cfs/klemming/scratch/h/hutaobo/pyxenium_mechanostress_atera_2026-04/staging
SAMPLE_GLOB = WTA_Preview_FFPE_Breast_Cancer_outs
OUTPUT_ROOT = /cfs/klemming/scratch/h/hutaobo/pyxenium_mechanostress_atera_2026-04/outputs
STATIC_DIR = /cfs/klemming/scratch/h/hutaobo/pyxenium_mechanostress_atera_2026-04/static/mechanostress_atera_pdc
RUN_FULL_WORKFLOW = True

S1/S5 annotation mapping#

The Atera cell-group table contains cell_id, group, and color. The mechanostress workflow needs a tumor-stroma annotation and coordinates, so the PDC preparation script writes a derived annotation CSV into the staging sample directory.

Mapping used here:

s_class: the S1-S5 panel from the contour tutorial.
mechanostress_class = Tumor: S1 invasive tumor variants and all S5 apocrine/luminal DCIS cells.
mechanostress_class = Stromal: S1 invasive CAFs.
mechanostress_class = Other: S2/S3/S4 non-S1/S5 context and unassigned cells.
axis_pool = S1_CAF: the main workflow fibroblast axis-strength pool.
CAF/fibroblast comparison pools: S1_CAF, S4_CAF, and S3_CXCL14_fibroblast. S5 itself has no CAF label in this annotation.

S1_S5_PANEL = {
    "S1": [
        "CAFs Invasive Associated",
        "11q13 Invasive Tumor Cells",
        "11q13 Invasive Tumor Cells (G1/S)",
        "11q13 Invasive Tumor Cells (Mitotic)",
    ],
    "S2": ["Basal-like Structured DCIS Cells", "Dendritic Cells", "B Cells", "T Lymphocytes"],
    "S3": ["Mast Cells", "Myeloid Cells", "Macrophages", "CXCL14+ Fibroblasts", "Endothelial Cells", "Pericytes"],
    "S4": ["Myoepithelial Cells", "CAFs DCIS Associated", "Plasma Cells"],
    "S5": ["Apocrine Cells", "Luminal-like Amorphous DCIS Cells"],
}
LABEL_ALIASES = {
    "CAFs Invasive Associated": "CAFs, Invasive Associated",
    "CAFs DCIS Associated": "CAFs, DCIS Associated",
}
S1_TUMOR_LABELS = {
    "11q13 Invasive Tumor Cells",
    "11q13 Invasive Tumor Cells (G1/S)",
    "11q13 Invasive Tumor Cells (Mitotic)",
}


def build_atera_mechanostress_annotation(dataset_root: Path) -> pd.DataFrame:
    groups = pd.read_csv(dataset_root / "WTA_Preview_FFPE_Breast_Cancer_cell_groups.csv")
    cells = pd.read_parquet(dataset_root / "cells.parquet")[["cell_id", "x_centroid", "y_centroid"]]
    resolved_to_s = {
        LABEL_ALIASES.get(label, label): s_class
        for s_class, labels in S1_S5_PANEL.items()
        for label in labels
    }
    frame = groups.merge(cells, on="cell_id", how="left")
    frame["cluster"] = frame["group"].astype(str)
    frame["s_class"] = frame["group"].map(resolved_to_s).fillna("Unassigned")
    frame["mechanostress_class"] = "Other"
    frame.loc[frame["group"].isin(S1_TUMOR_LABELS) | frame["s_class"].eq("S5"), "mechanostress_class"] = "Tumor"
    frame.loc[frame["group"].eq("CAFs, Invasive Associated"), "mechanostress_class"] = "Stromal"
    frame["axis_pool"] = "Other"
    frame.loc[frame["group"].eq("CAFs, Invasive Associated"), "axis_pool"] = "S1_CAF"
    frame.loc[frame["group"].eq("CAFs, DCIS Associated"), "axis_pool"] = "S4_CAF"
    frame.loc[frame["group"].eq("CXCL14+ Fibroblasts"), "axis_pool"] = "S3_CXCL14_fibroblast"
    return frame

annotation_counts_path = STATIC_DIR / "s1_s5_mechanostress_annotation_counts.csv"
if annotation_counts_path.exists():
    annotation_counts = pd.read_csv(annotation_counts_path)
elif DATASET_ROOT.exists():
    annotation_counts = (
        build_atera_mechanostress_annotation(DATASET_ROOT)
        .groupby(["s_class", "mechanostress_class", "axis_pool"], dropna=False)
        .size()
        .reset_index(name="n_cells")
        .sort_values(["s_class", "mechanostress_class", "axis_pool"])
    )
else:
    raise FileNotFoundError(
        f"No saved annotation snapshot at {annotation_counts_path} and dataset root is unavailable: {DATASET_ROOT}"
    )
annotation_counts

	s_class	mechanostress_class	axis_pool	n_cells
0	S1	Stromal	S1_CAF	4001
1	S1	Tumor	Other	68872
2	S2	Other	Other	23298
3	S3	Other	Other	21754
4	S3	Other	S3_CXCL14_fibroblast	4936
5	S4	Other	Other	9311
6	S4	Other	S4_CAF	24442
7	S5	Tumor	Other	13431
8	Unassigned	Other	Other	12

One-shot workflow#

On PDC the notebook runs run_mechanostress_cohort(...) against a staging root with one sample directory. The staging directory symlinks the immutable dataset files and contains the derived annotation CSV. This exercises the public cohort API and writes the fixed artifact set.

CANDIDATE_COUPLING_GENES = (
    "EPCAM", "KRT8", "KRT18", "KRT19", "KRT5", "KRT14",
    "ERBB2", "MKI67", "MMP11", "MUC1", "NIBAN1", "SORL1",
)

config = MechanostressConfig(
    axis=AxisStrengthConfig(
        er_threshold=2.0,
        groupby=("axis_pool", "cluster"),
        cell_query="axis_pool == 'S1_CAF'",
    ),
    tumor_stroma=TumorStromaGrowthConfig(
        annotation_col="mechanostress_class",
        tumor_label="Tumor",
        stroma_label="Stromal",
        x_col="x_centroid",
        y_col="y_centroid",
        method="delaunay_hop",
    ),
    polarity=PolarityConfig(offset_norm_threshold=0.30),
    coupling_genes=CANDIDATE_COUPLING_GENES,
    sample_id="Atera_WTA_Breast_S1S5",
)

CAF_AXIS_POOLS = ("S1_CAF", "S4_CAF", "S3_CXCL14_fibroblast")
CAF_POOL_LABELS = {
    "S1_CAF": "S1 invasive CAF",
    "S4_CAF": "S4 DCIS-associated CAF",
    "S3_CXCL14_fibroblast": "S3 CXCL14+ fibroblast",
}
CAF_COMPARISON_RADIUS_UM = 100.0


def _label_caf_pool(frame: pd.DataFrame) -> pd.DataFrame:
    labelled = frame.copy()
    labelled["caf_pool_label"] = labelled["axis_pool"].map(CAF_POOL_LABELS).fillna(labelled["axis_pool"].astype(str))
    return labelled


def _caf_sort(frame: pd.DataFrame) -> pd.DataFrame:
    if frame.empty or "axis_pool" not in frame.columns:
        return frame
    ordered = frame.copy()
    ordered["axis_pool"] = pd.Categorical(ordered["axis_pool"], categories=CAF_AXIS_POOLS, ordered=True)
    sort_cols = [column for column in ("axis_pool", "radius_um", "cluster") if column in ordered.columns]
    ordered = ordered.sort_values(sort_cols).reset_index(drop=True)
    ordered["axis_pool"] = ordered["axis_pool"].astype(str)
    return ordered


def build_caf_mechanical_comparison(result, *, er_threshold: float = 2.0) -> dict[str, pd.DataFrame]:
    axes = result.cell_axes.copy()
    axes = axes.loc[axes["axis_pool"].isin(CAF_AXIS_POOLS)].copy()
    axes["elongation_ratio"] = pd.to_numeric(axes["elongation_ratio"], errors="coerce")
    axes["er_ge_threshold"] = axes["elongation_ratio"] >= er_threshold
    axes = _label_caf_pool(axes)
    elongated = axes.loc[axes["er_ge_threshold"]].copy()

    orientation = summarize_axial_orientation(elongated, groupby=["axis_pool", "cluster"], local_k=15)
    orientation = _caf_sort(_label_caf_pool(orientation))
    axis_strength = compute_ane_density(elongated, groupby=["axis_pool", "cluster"])
    axis_strength = _caf_sort(_label_caf_pool(axis_strength))

    polarity = result.cell_polarity.copy()
    polarity = polarity.loc[polarity["axis_pool"].isin(CAF_AXIS_POOLS)].copy()
    polarity = _label_caf_pool(polarity)
    polarity_summary = summarize_cell_polarity(
        polarity,
        groupby=["axis_pool", "cluster"],
        sample_id=result.summary.get("sample_id"),
    )
    polarity_summary = _caf_sort(_label_caf_pool(polarity_summary))

    axis_counts = (
        axes.groupby(["axis_pool", "cluster"], sort=False, dropna=False)
        .agg(
            n_caf_cells=("cell_id", "nunique"),
            n_er_ge_2=("er_ge_threshold", "sum"),
            er_median=("elongation_ratio", "median"),
            er_q25=("elongation_ratio", lambda values: values.quantile(0.25)),
            er_q75=("elongation_ratio", lambda values: values.quantile(0.75)),
        )
        .reset_index()
    )
    radius_slice = axis_strength.loc[axis_strength["radius_um"].eq(CAF_COMPARISON_RADIUS_UM)].copy()
    comparison = axis_counts.merge(
        orientation[["axis_pool", "cluster", "axial_rbar", "mean_axis_degrees", "local_rbar_median"]],
        on=["axis_pool", "cluster"],
        how="left",
    ).merge(
        radius_slice[["axis_pool", "cluster", "ANE_density_median", "coh_median", "neigh_median"]],
        on=["axis_pool", "cluster"],
        how="left",
    ).merge(
        polarity_summary[["axis_pool", "cluster", "n_cells", "polarized_fraction", "offset_norm_median", "offset_distance_median_um"]],
        on=["axis_pool", "cluster"],
        how="left",
        suffixes=("", "_polarity"),
    )
    comparison = _caf_sort(_label_caf_pool(comparison))
    return {
        "comparison": comparison,
        "orientation": orientation,
        "axis_strength": axis_strength,
        "polarity_summary": polarity_summary,
        "axes": axes,
    }


def write_caf_comparison_static(result, static_dir: Path) -> dict[str, pd.DataFrame]:
    caf = build_caf_mechanical_comparison(result)
    caf["comparison"].to_csv(static_dir / "caf_mechanical_comparison_summary.csv", index=False)
    caf["orientation"].to_csv(static_dir / "caf_orientation_summary.csv", index=False)
    caf["axis_strength"].to_csv(static_dir / "caf_axis_strength_by_radius.csv", index=False)
    caf["polarity_summary"].to_csv(static_dir / "caf_polarity_summary.csv", index=False)

    axes = caf["axes"].copy()
    axes["caf_pool_label"] = pd.Categorical(
        axes["caf_pool_label"],
        categories=[CAF_POOL_LABELS[key] for key in CAF_AXIS_POOLS],
        ordered=True,
    )
    fig, ax = plt.subplots(figsize=(7, 4))
    if not axes.empty:
        axes.boxplot(column="elongation_ratio", by="caf_pool_label", grid=False, rot=20, ax=ax)
        fig.suptitle("")
    ax.set_xlabel("CAF/fibroblast pool")
    ax.set_ylabel("Nucleus elongation ratio")
    ax.set_title("CAF elongation ratio by annotation pool")
    fig.tight_layout()
    fig.savefig(static_dir / "caf_er_distribution.png", dpi=180)
    plt.close(fig)

    plt.figure(figsize=(7, 4))
    for axis_pool, group in caf["axis_strength"].groupby("axis_pool", sort=False):
        group = group.sort_values("radius_um")
        plt.plot(group["radius_um"], group["ANE_density_median"], marker="o", label=CAF_POOL_LABELS.get(axis_pool, axis_pool))
    plt.xscale("log")
    plt.xlabel("Radius (um)")
    plt.ylabel("Median ANE density")
    plt.title("CAF axial neighborhood density by radius")
    plt.legend(frameon=False, fontsize=8)
    plt.tight_layout()
    plt.savefig(static_dir / "caf_axis_strength_by_radius.png", dpi=180)
    plt.close()

    plt.figure(figsize=(7, 4))
    local = caf["comparison"].copy()
    if not local.empty:
        plt.bar(local["caf_pool_label"], local["local_rbar_median"], color=["#2F6F8F", "#4A9D67", "#D9822B"])
    plt.ylabel("Median local axial Rbar")
    plt.title("Local axial coherence across CAF pools")
    plt.xticks(rotation=20, ha="right")
    plt.tight_layout()
    plt.savefig(static_dir / "caf_local_rbar_comparison.png", dpi=180)
    plt.close()

    fig, axes_plot = plt.subplots(1, 2, figsize=(9, 4))
    polarity = caf["comparison"].copy()
    if not polarity.empty:
        axes_plot[0].bar(polarity["caf_pool_label"], polarity["offset_norm_median"], color="#8E6BBE")
        axes_plot[1].bar(polarity["caf_pool_label"], polarity["polarized_fraction"], color="#C43C39")
    axes_plot[0].set_ylabel("Median offset norm")
    axes_plot[1].set_ylabel("Polarized fraction")
    for axis in axes_plot:
        axis.tick_params(axis="x", rotation=20)
        for label in axis.get_xticklabels():
            label.set_ha("right")
    fig.suptitle("CAF polarity across annotation pools")
    fig.tight_layout()
    fig.savefig(static_dir / "caf_polarity_comparison.png", dpi=180)
    plt.close(fig)
    return caf


def _best_pool(table: pd.DataFrame, column: str) -> str | None:
    if table.empty or column not in table.columns:
        return None
    values = pd.to_numeric(table[column], errors="coerce")
    if values.notna().sum() == 0:
        return None
    return str(table.loc[values.idxmax(), "caf_pool_label"])


def write_tutorial_static(result, annotation: pd.DataFrame, static_dir: Path, candidate_genes: tuple[str, ...]) -> None:
    static_dir.mkdir(parents=True, exist_ok=True)
    counts = (
        annotation.groupby(["s_class", "mechanostress_class", "axis_pool"], dropna=False)
        .size()
        .reset_index(name="n_cells")
        .sort_values(["s_class", "mechanostress_class", "axis_pool"])
    )
    counts.to_csv(static_dir / "s1_s5_mechanostress_annotation_counts.csv", index=False)
    available = set(result.distance_expression_coupling["gene"].astype(str)) if not result.distance_expression_coupling.empty else set()
    pd.DataFrame({"gene": candidate_genes, "available": [gene in available for gene in candidate_genes]}).to_csv(
        static_dir / "coupling_gene_availability.csv", index=False
    )
    for name, table in result.table_map().items():
        if name in {"axis_strength_by_radius", "orientation_summary", "tumor_growth_summary", "distance_expression_coupling", "polarity_summary"}:
            table.to_csv(static_dir / f"{name}.csv", index=False)
    caf = write_caf_comparison_static(result, static_dir)
    summary = result.summary.copy()
    summary["available_coupling_genes"] = [gene for gene in candidate_genes if gene in available]
    summary["missing_coupling_genes"] = [gene for gene in candidate_genes if gene not in available]
    summary["caf_comparison"] = {
        "highest_er_median": _best_pool(caf["comparison"], "er_median"),
        "highest_local_rbar_median": _best_pool(caf["comparison"], "local_rbar_median"),
        "highest_ane_density_100um": _best_pool(caf["comparison"], "ANE_density_median"),
        "highest_offset_norm_median": _best_pool(caf["comparison"], "offset_norm_median"),
    }
    (static_dir / "summary.json").write_text(json.dumps(result.summary, indent=2, default=str) + "\n", encoding="utf-8")
    (static_dir / "tutorial_summary.json").write_text(json.dumps(summary, indent=2, default=str) + "\n", encoding="utf-8")

    axis = result.axis_strength_by_radius.sort_values("radius_um")
    plt.figure(figsize=(6, 4))
    if not axis.empty:
        plt.plot(axis["radius_um"], axis["ANE_density_median"], marker="o", color="#2F6F8F")
    plt.xscale("log")
    plt.xlabel("Radius (um)")
    plt.ylabel("Median ANE density")
    plt.title("S1 CAF axial neighborhood density")
    plt.tight_layout()
    plt.savefig(static_dir / "axis_strength_density.png", dpi=180)
    plt.close()

    plt.figure(figsize=(6, 4))
    if not result.tumor_growth_summary.empty:
        row = result.tumor_growth_summary.iloc[0]
        vals = pd.Series({"Infiltrative tumor": row.get("n_infiltrative", 0), "Expanding tumor": row.get("n_expanding", 0)})
        vals.plot(kind="bar", color=["#C43C39", "#4A9D67"])
    plt.ylabel("Tumor cells")
    plt.title("Tumor-stroma Delaunay-hop states")
    plt.xticks(rotation=20, ha="right")
    plt.tight_layout()
    plt.savefig(static_dir / "tumor_growth_states.png", dpi=180)
    plt.close()

    plt.figure(figsize=(6, 4))
    if not result.cell_polarity.empty:
        result.cell_polarity["offset_norm"].dropna().clip(upper=1.5).plot(kind="hist", bins=50, color="#8E6BBE")
        plt.axvline(0.30, color="black", linestyle="--", linewidth=1)
    plt.xlabel("Nucleus-to-cell offset / equivalent cell radius")
    plt.ylabel("Cells")
    plt.title("Cell polarity offset distribution")
    plt.tight_layout()
    plt.savefig(static_dir / "polarity_offset_histogram.png", dpi=180)
    plt.close()

    plt.figure(figsize=(7, 4))
    if not result.distance_expression_coupling.empty:
        top = result.distance_expression_coupling.sort_values("spearman_rho").tail(10)
        plt.barh(top["gene"], top["spearman_rho"], color="#D9822B")
        plt.axvline(0, color="black", linewidth=0.8)
    plt.xlabel("Spearman rho")
    plt.title("Tumor distance-expression coupling")
    plt.tight_layout()
    plt.savefig(static_dir / "distance_expression_coupling.png", dpi=180)
    plt.close()


if RUN_FULL_WORKFLOW:
    cohort_result = run_mechanostress_cohort(
        COHORT_ROOT,
        output_dir=OUTPUT_ROOT,
        config=config.copy_with(sample_id=None),
        sample_glob=SAMPLE_GLOB,
        annotation_glob="*_cell_clusters_with_annotation_and_coord.csv",
        prefer="h5",
        fail_fast=True,
    )
    result = next(iter(cohort_result.results.values()))
    write_tutorial_static(result, build_atera_mechanostress_annotation(DATASET_ROOT), STATIC_DIR, CANDIDATE_COUPLING_GENES)
    display(cohort_result.summary)
else:
    display(Markdown("Using saved tutorial snapshot. Set `PYXENIUM_MECHANOSTRESS_RUN_FULL=1` to rerun the full PDC workflow."))

	sample_id	n_axes	n_axis_strength_rows	n_orientation_summary_rows	n_tumor_growth_cells	n_distance_expression_coupling_rows	n_polarity_cells	tumor_growth.sample_id	tumor_growth.n_tumor	tumor_growth.n_infiltrative	tumor_growth.n_expanding	tumor_growth.n_tumor_valid	tumor_growth.infiltrative_proportion	tumor_growth.mean_infil_dist_to_stromal	tumor_growth.mean_expand_dist_to_stromal
0	WTA_Preview_FFPE_Breast_Cancer_outs	168335	12	1	170057	12	168335	WTA_Preview_FFPE_Breast_Cancer_outs	82303	9026	73277	82303	0.109668	33.619929	124.810325

Step-by-step public API#

The same workflow can be decomposed into public functions for teaching and debugging. This cell is intentionally explicit: it shows where axes, ANE density, tumor-stroma growth calls, polarity, and expression coupling enter the result.

if RUN_FULL_WORKFLOW:
    annotation = build_atera_mechanostress_annotation(DATASET_ROOT)
    slide = read_xenium(DATASET_ROOT, as_="slide", prefer="h5", include_transcripts=False, include_boundaries=True)

    axes = estimate_cell_axes(slide.shapes["nucleus_boundaries"])
    axes = axes.merge(annotation[["cell_id", "cluster", "s_class", "mechanostress_class", "axis_pool"]], on="cell_id", how="left")
    s1_caf_axes = axes.query("axis_pool == 'S1_CAF' and elongation_ratio >= 2.0").copy()
    orientation_summary = summarize_axial_orientation(s1_caf_axes, groupby=["axis_pool", "cluster"], local_k=15)
    axis_strength = compute_ane_density(s1_caf_axes, groupby=["axis_pool", "cluster"])

    growth = classify_tumor_stroma_growth(
        annotation,
        annotation_col="mechanostress_class",
        tumor_label="Tumor",
        stroma_label="Stromal",
        x_col="x_centroid",
        y_col="y_centroid",
        method="delaunay_hop",
    )
    polarity = compute_cell_polarity(cell_boundaries=slide.shapes["cell_boundaries"], nucleus_boundaries=slide.shapes["nucleus_boundaries"])
    display(orientation_summary.head())
    display(axis_strength.head())
    display(growth.head())
    display(polarity.head())
else:
    display(pd.read_csv(STATIC_DIR / "orientation_summary.csv"))
    display(pd.read_csv(STATIC_DIR / "axis_strength_by_radius.csv").head())

	axis_pool	cluster	n_axes	axial_rbar	mean_axis_degrees	kappa	rayleigh_p	local_rbar_median	local_rbar_q25	local_rbar_q75	local_rbar_n_cells
0	S1_CAF	CAFs, Invasive Associated	2493	0.073272	52.045082	0.146938	0.000002	0.448844	0.27453	0.641602	2493

	axis_pool	cluster	radius_um	coh_median	coh_q25	coh_q75	coh_n_cells	neigh_median	neigh_q25	neigh_q75	neigh_n_cells	ANE_median	ANE_q25	ANE_q75	ANE_density_median	ANE_density_q25	ANE_density_q75
0	S1_CAF	CAFs, Invasive Associated	5.0	0.984708	0.963206	0.998091	48	0.0	0.0	0.0	2493	0.984708	0.963206	0.998091	0.012538	0.012264	0.012708
1	S1_CAF	CAFs, Invasive Associated	10.0	0.961127	0.885093	0.992598	324	0.0	0.0	0.0	2493	0.966694	0.885435	0.995921	0.003077	0.002818	0.003170
2	S1_CAF	CAFs, Invasive Associated	20.0	0.951273	0.857285	0.985468	1037	0.0	0.0	1.0	2493	0.988742	0.911577	1.749390	0.000787	0.000725	0.001392
3	S1_CAF	CAFs, Invasive Associated	30.0	0.931270	0.807942	0.978386	1539	1.0	0.0	2.0	2493	0.999741	0.936074	1.990397	0.000354	0.000331	0.000704
4	S1_CAF	CAFs, Invasive Associated	50.0	0.883856	0.713222	0.959852	2066	2.0	1.0	4.0	2493	1.948850	0.990393	3.410119	0.000248	0.000126	0.000434

	cell_id	group	color	x_centroid	y_centroid	cluster	s_class	mechanostress_class	axis_pool	stroma_hop	tumor_growth_pattern	dist_to_nearest_stromal	dist_to_nearest_tumor
0	cgmkdohb-1	Endothelial Cells	#795548	988.603394	2998.515625	Endothelial Cells	S3	Other	Other	NaN	NaN	NaN	NaN
1	cgmhpeak-1	Luminal-like Amorphous DCIS Cells	#F943D2	920.002380	2998.296631	Luminal-like Amorphous DCIS Cells	S5	Tumor	Other	5.0	expanding	120.829583	15.598983
2	cgngakjk-1	Pericytes	#8D8D00	852.635254	2999.503418	Pericytes	S3	Other	Other	NaN	NaN	NaN	NaN
3	cgoclelb-1	Luminal-like Amorphous DCIS Cells	#F943D2	951.682922	2902.430908	Luminal-like Amorphous DCIS Cells	S5	Tumor	Other	4.0	expanding	128.512396	6.777092
4	cglinfcj-1	Luminal-like Amorphous DCIS Cells	#F943D2	961.724487	2932.555420	Luminal-like Amorphous DCIS Cells	S5	Tumor	Other	2.0	expanding	100.771389	13.086734

	cell_id	cell_centroid_x	cell_centroid_y	cell_area	cell_equivalent_radius_um	nucleus_centroid_x	nucleus_centroid_y	nucleus_area	nucleus_equivalent_radius_um	polarity_dx	polarity_dy	offset_distance_um	offset_norm	polarity_angle_degrees	polarized
0	aaaajgij-1	73.522306	2732.760381	78.205039	4.989332	73.530805	2731.893369	58.631144	4.320055	0.008500	-0.867012	0.867053	0.173781	270.561682	False
1	aaaandia-1	153.385380	2721.124277	109.292873	5.898220	153.053891	2722.042227	82.991347	5.139744	-0.331489	0.917949	0.975969	0.165468	109.855600	False
2	aaabalki-1	15.698925	2817.272773	52.783643	4.098970	15.214442	2816.992275	30.884517	3.135418	-0.484482	-0.280498	0.559823	0.136577	210.069315	False
3	aaabchln-1	14.746960	2799.108926	40.998806	3.612523	14.687462	2799.729375	28.243618	2.998370	-0.059498	0.620449	0.623295	0.172537	95.477623	False
4	aaaccnnp-1	58.715848	2763.223232	47.342835	3.881970	58.545854	2762.925723	38.921874	3.519832	-0.169994	-0.297510	0.342651	0.088267	240.256815	False

CAF mechanical-state comparison#

The Atera annotation does not contain a literal S5_CAF group. S5 in this panel is apocrine/luminal DCIS tumor context. For CAF biology, this section therefore compares the available annotated CAF/fibroblast pools: invasive S1_CAF, DCIS-associated S4_CAF, and S3_CXCL14_fibroblast.

The comparison uses cells with elongation_ratio >= 2.0 for axial orientation and ANE density. Polarity is summarized on all cells in each CAF/fibroblast pool.

caf_summary = pd.read_csv(STATIC_DIR / "caf_mechanical_comparison_summary.csv")
caf_axis_strength = pd.read_csv(STATIC_DIR / "caf_axis_strength_by_radius.csv")
caf_orientation = pd.read_csv(STATIC_DIR / "caf_orientation_summary.csv")
caf_polarity = pd.read_csv(STATIC_DIR / "caf_polarity_summary.csv")

metric_labels = {
    "er_median": "median elongation ratio",
    "local_rbar_median": "median local axial Rbar",
    "ANE_density_median": "median ANE density at 100 um",
    "offset_norm_median": "median polarity offset norm",
}
for column, label in metric_labels.items():
    source = caf_summary
    if column == "ANE_density_median":
        source = caf_axis_strength.loc[caf_axis_strength["radius_um"].eq(CAF_COMPARISON_RADIUS_UM)]
    values = pd.to_numeric(source[column], errors="coerce")
    if values.notna().any():
        row = source.loc[values.idxmax()]
        print(f"Highest {label}: {row['caf_pool_label']} ({values.max():.4g})")

display(caf_summary[[
    "axis_pool", "caf_pool_label", "n_caf_cells", "n_er_ge_2", "er_median",
    "local_rbar_median", "ANE_density_median", "offset_norm_median", "polarized_fraction",
]])
display(caf_axis_strength[["axis_pool", "caf_pool_label", "radius_um", "ANE_density_median", "coh_median", "neigh_median"]].head(36))

Highest median elongation ratio: S1 invasive CAF (2.356)
Highest median local axial Rbar: S4 DCIS-associated CAF (0.5993)
Highest median ANE density at 100 um: S4 DCIS-associated CAF (0.0002902)
Highest median polarity offset norm: S1 invasive CAF (0.2847)

	axis_pool	caf_pool_label	n_caf_cells	n_er_ge_2	er_median	local_rbar_median	ANE_density_median	offset_norm_median	polarized_fraction
0	S1_CAF	S1 invasive CAF	3999	2493	2.356217	0.448844	0.000136	0.284690	0.471618
1	S4_CAF	S4 DCIS-associated CAF	24376	13460	2.120415	0.599311	0.000290	0.264185	0.428659
2	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	4504	2307	2.039332	0.379728	0.000061	0.233226	0.366785

	axis_pool	caf_pool_label	radius_um	ANE_density_median	coh_median	neigh_median
0	S1_CAF	S1 invasive CAF	5.0	0.012538	0.984708	0.0
1	S1_CAF	S1 invasive CAF	10.0	0.003077	0.961127	0.0
2	S1_CAF	S1 invasive CAF	20.0	0.000787	0.951273	0.0
3	S1_CAF	S1 invasive CAF	30.0	0.000354	0.931270	1.0
4	S1_CAF	S1 invasive CAF	50.0	0.000248	0.883856	2.0
5	S1_CAF	S1 invasive CAF	75.0	0.000172	0.826181	4.0
6	S1_CAF	S1 invasive CAF	100.0	0.000136	0.773440	6.0
7	S1_CAF	S1 invasive CAF	150.0	0.000097	0.687385	11.0
8	S1_CAF	S1 invasive CAF	200.0	0.000078	0.626449	17.0
9	S1_CAF	S1 invasive CAF	300.0	0.000065	0.566703	34.0
10	S1_CAF	S1 invasive CAF	400.0	0.000060	0.544598	60.0
11	S1_CAF	S1 invasive CAF	500.0	0.000057	0.532520	90.0
12	S4_CAF	S4 DCIS-associated CAF	5.0	0.012570	0.986154	0.0
13	S4_CAF	S4 DCIS-associated CAF	10.0	0.003109	0.960382	0.0
14	S4_CAF	S4 DCIS-associated CAF	20.0	0.000791	0.921237	1.0
15	S4_CAF	S4 DCIS-associated CAF	30.0	0.000564	0.880385	1.0
16	S4_CAF	S4 DCIS-associated CAF	50.0	0.000364	0.806556	4.0
17	S4_CAF	S4 DCIS-associated CAF	75.0	0.000315	0.753477	8.0
18	S4_CAF	S4 DCIS-associated CAF	100.0	0.000290	0.720772	13.0
19	S4_CAF	S4 DCIS-associated CAF	150.0	0.000256	0.672470	28.0
20	S4_CAF	S4 DCIS-associated CAF	200.0	0.000234	0.640737	46.0
21	S4_CAF	S4 DCIS-associated CAF	300.0	0.000205	0.600152	98.0
22	S4_CAF	S4 DCIS-associated CAF	400.0	0.000189	0.582329	165.0
23	S4_CAF	S4 DCIS-associated CAF	500.0	0.000175	0.568240	245.0
24	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	5.0	0.012669	0.995013	0.0
25	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	10.0	0.003164	0.990143	0.0
26	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	20.0	0.000793	0.973463	0.0
27	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	30.0	0.000353	0.955635	0.0
28	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	50.0	0.000127	0.927153	1.0
29	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	75.0	0.000097	0.868155	1.0
30	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	100.0	0.000061	0.805991	2.0
31	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	150.0	0.000040	0.693096	4.0
32	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	200.0	0.000035	0.638428	7.0
33	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	300.0	0.000030	0.583079	15.0
34	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	400.0	0.000028	0.548139	26.0
35	S3_CXCL14_fibroblast	S3 CXCL14+ fibroblast	500.0	0.000027	0.534983	40.0

Snapshot summaries#

The saved snapshot below was produced from the full Atera data. Complete large per-cell tables remain in the PDC output root; only summary tables and compact figures are kept in the documentation tree.

summary = json.loads((STATIC_DIR / "tutorial_summary.json").read_text(encoding="utf-8"))
tumor_summary = pd.read_csv(STATIC_DIR / "tumor_growth_summary.csv")
polarity_summary = pd.read_csv(STATIC_DIR / "polarity_summary.csv")
gene_availability = pd.read_csv(STATIC_DIR / "coupling_gene_availability.csv")

print(json.dumps(summary, indent=2))
display(tumor_summary)
display(polarity_summary)
display(gene_availability)

{
  "sample_id": "WTA_Preview_FFPE_Breast_Cancer_outs",
  "n_axes": 168335,
  "n_axis_strength_rows": 12,
  "n_orientation_summary_rows": 1,
  "n_tumor_growth_cells": 170057,
  "n_distance_expression_coupling_rows": 12,
  "n_polarity_cells": 168335,
  "tumor_growth": {
    "sample_id": "WTA_Preview_FFPE_Breast_Cancer_outs",
    "n_tumor": 82303,
    "n_infiltrative": 9026,
    "n_expanding": 73277,
    "n_tumor_valid": 82303,
    "infiltrative_proportion": 0.10966793434018202,
    "mean_infil_dist_to_stromal": 33.61992883561506,
    "mean_expand_dist_to_stromal": 124.81032478936972
  },
  "available_coupling_genes": [
    "EPCAM",
    "KRT8",
    "KRT18",
    "KRT19",
    "KRT5",
    "KRT14",
    "ERBB2",
    "MKI67",
    "MMP11",
    "MUC1",
    "NIBAN1",
    "SORL1"
  ],
  "missing_coupling_genes": [],
  "caf_comparison": {
    "highest_er_median": "S1 invasive CAF",
    "highest_local_rbar_median": "S4 DCIS-associated CAF",
    "highest_ane_density_100um": "S4 DCIS-associated CAF",
    "highest_offset_norm_median": "S1 invasive CAF"
  }
}

	sample_id	n_tumor	n_infiltrative	n_expanding	n_tumor_valid	infiltrative_proportion	mean_infil_dist_to_stromal	mean_expand_dist_to_stromal
0	WTA_Preview_FFPE_Breast_Cancer_outs	82303	9026	73277	82303	0.109668	33.619929	124.810325

	sample_id	n_cells	n_polarized	polarized_fraction	offset_norm_median	offset_distance_median_um
0	WTA_Preview_FFPE_Breast_Cancer_outs	168335	66658	0.395984	0.249179	1.083575

	gene	available
0	EPCAM	True
1	KRT8	True
2	KRT18	True
3	KRT19	True
4	KRT5	True
5	KRT14	True
6	ERBB2	True
7	MKI67	True
8	MMP11	True
9	MUC1	True
10	NIBAN1	True
11	SORL1	True

Visual outputs#

S1 CAF axial neighborhood density

Tumor growth states

Cell polarity offset histogram

Tumor distance-expression coupling

CAF elongation ratio distribution

CAF axial neighborhood density by radius

CAF local axial coherence comparison

CAF polarity comparison

axis_strength = pd.read_csv(STATIC_DIR / "axis_strength_by_radius.csv")
coupling = pd.read_csv(STATIC_DIR / "distance_expression_coupling.csv")

display(axis_strength[["radius_um", "ANE_density_median", "coh_median", "neigh_median"]].head(12))
display(coupling.sort_values("spearman_rho", ascending=False).head(12))

	radius_um	ANE_density_median	coh_median	neigh_median
0	5.0	0.012538	0.984708	0.0
1	10.0	0.003077	0.961127	0.0
2	20.0	0.000787	0.951273	0.0
3	30.0	0.000354	0.931270	1.0
4	50.0	0.000248	0.883856	2.0
5	75.0	0.000172	0.826181	4.0
6	100.0	0.000136	0.773440	6.0
7	150.0	0.000097	0.687385	11.0
8	200.0	0.000078	0.626449	17.0
9	300.0	0.000065	0.566703	34.0
10	400.0	0.000060	0.544598	60.0
11	500.0	0.000057	0.532520	90.0

	gene	n_cells	n_nonzero_cells	spearman_rho	p_value	mean_expression	tumor_enriched
10	NIBAN1	82303	17686	0.288945	0.000000e+00	0.910902	False
9	MUC1	82303	29462	0.153944	0.000000e+00	1.842898	False
11	SORL1	82303	10690	0.123928	5.623827e-279	0.174684	False
3	KRT19	82303	23458	0.107029	3.248345e-208	0.423581	False
2	KRT18	82303	35448	0.029627	1.876570e-17	0.711396	False
6	ERBB2	82303	6094	0.025378	3.299462e-13	0.085185	False
1	KRT8	82303	19872	-0.002597	4.562877e-01	0.314861	False
5	KRT14	82303	608	-0.003432	3.248030e-01	0.007825	False
8	MMP11	82303	838	-0.030752	1.103466e-18	0.011227	False
4	KRT5	82303	6088	-0.051441	2.380141e-49	0.098648	False
0	EPCAM	82303	67829	-0.083403	5.935536e-127	3.102511	False
7	MKI67	82303	11208	-0.119694	2.994245e-260	0.330450	False

Biological interpretation#

In this S1/S5 mechanical context, about 11% of tumor cells are classified as infiltrative by the Delaunay-hop tumor-stroma rule. S1 CAF axis strength is summarized across radii from 5 to 500 microns, while the polarity summary keeps the continuous nucleus-to-cell offset and a thresholded polarized label. The coupling table is intentionally exploratory: it ranks how tumor-cell expression changes with nearest stromal distance and should be interpreted alongside the cell-group definitions and histology.

The CAF mechanical-state comparison is deliberately not named S5_CAF: this Atera annotation has S5 DCIS tumor cells but no S5 CAF label. Instead, it compares the available CAF/fibroblast pools and reports which pool has the highest median ER, strongest local axial coherence, highest 100 um ANE density, and strongest polarity offset in the executed notebook output.

HNSCC and Suzuki LUAD validation commands remain dataset-specific checks and are not run on this Atera tutorial dataset.

PDC rerun commands#

From the local repository:

python benchmarking/mechanostress_pdc/scripts/stage_to_pdc.py --dry-run
python benchmarking/mechanostress_pdc/scripts/stage_to_pdc.py

On PDC:

export PDC_ROOT=/cfs/klemming/scratch/h/hutaobo/pyxenium_mechanostress_atera_2026-04
export PDC_DATASET_ROOT=/cfs/klemming/projects/supr/naiss2025-22-606/data/WTA_Preview_FFPE_Breast_Cancer_outs
bash benchmarking/mechanostress_pdc/scripts/bootstrap_pdc_env.sh
bash benchmarking/mechanostress_pdc/scripts/submit_pdc_notebook.sh --dry-run
bash benchmarking/mechanostress_pdc/scripts/submit_pdc_notebook.sh

After the Slurm job completes, collect the executed notebook and compact static artifacts locally:

python benchmarking/mechanostress_pdc/scripts/collect_pdc_results.py