pyXenium.pathway Tutorial#

Overview#

This notebook uses the same Atera WTA breast reproducibility bundle as the cell-cell interaction walkthrough, but shifts the focus to pathway-level organization. It compares gene-topology aggregation against activity point-cloud scoring and shows how each view emphasizes different spatial programs.

Biological question#

Which pathway programs most cleanly localize to macrophage, vascular, plasma-cell, and DCIS-associated compartments, and how similar are those assignments across the two pathway topology modes?

from __future__ import annotations

import json
import os
import sys
from pathlib import Path

import pandas as pd
from IPython.display import Image, Markdown, display


def find_repo_root() -> Path:
    for candidate in (Path.cwd(), *Path.cwd().parents):
        if (candidate / "pyproject.toml").exists():
            return candidate
    raise RuntimeError("Could not locate the pyXenium repository root.")


REPO_ROOT = find_repo_root()
SRC_ROOT = REPO_ROOT / "src"
if str(SRC_ROOT) not in sys.path:
    sys.path.insert(0, str(SRC_ROOT))

pd.set_option("display.max_columns", 20)
pd.set_option("display.max_rows", 12)
ATERA_DATASET_PATH = Path(
    os.environ.get(
        "PYXENIUM_ATERA_DATASET",
        r"Y:\long\10X_datasets\Xenium\Atera\WTA_Preview_FFPE_Breast_Cancer_outs",
    )
)
TBC_RESULTS_PATH = ATERA_DATASET_PATH / r"sfplot_tbc_formal_wta\results"
ARTIFACT_DIR = REPO_ROOT / "manuscript" / "atera_wta_breast_topology"
RUN_FULL_ANALYSIS = False

ATERA_DATASET_PATH, TBC_RESULTS_PATH, ARTIFACT_DIR
(WindowsPath('Y:/long/10X_datasets/Xenium/Atera/WTA_Preview_FFPE_Breast_Cancer_outs'),
 WindowsPath('Y:/long/10X_datasets/Xenium/Atera/WTA_Preview_FFPE_Breast_Cancer_outs/sfplot_tbc_formal_wta/results'),
 WindowsPath('D:/GitHub/pyXenium/manuscript/atera_wta_breast_topology'))

Dataset#

  • Raw study: Atera WTA FFPE breast Xenium export.

  • Versioned outputs: manuscript/atera_wta_breast_topology/.

  • Canonical API: compute_pathway_activity_matrix and pathway_topology_analysis.

Setup#

The notebook reads the committed pathway CSVs and figures generated from the real Atera run, then keeps an optional rerun cell for regenerating the pathway bundle locally.

payload = json.loads((ARTIFACT_DIR / "summary.json").read_text(encoding="utf-8"))
pathway_to_cell = pd.read_csv(ARTIFACT_DIR / "pathway_to_cell.csv", index_col=0)
pathway_activity_to_cell = pd.read_csv(ARTIFACT_DIR / "pathway_activity_to_cell.csv", index_col=0)
mode_comparison = pd.read_csv(ARTIFACT_DIR / "pathway_mode_comparison.csv")

assignments = pd.DataFrame(
    {
        "pathway": pathway_to_cell.index.astype(str),
        "best_gene_topology_celltype": pathway_to_cell.idxmin(axis=1).astype(str).to_numpy(),
        "best_activity_point_cloud_celltype": pathway_activity_to_cell.idxmin(axis=1).astype(str).to_numpy(),
    }
)

display(assignments)
display(mode_comparison[["pathway", "retained_cell_count", "retained_quantile", "activity_mode"]])
pathway best_gene_topology_celltype best_activity_point_cloud_celltype
0 MacrophageProgram Macrophages Macrophages
1 PlasmaProgram Plasma Cells Plasma Cells
2 VascularProgram Endothelial Cells Macrophages
3 BasalDCISProgram Basal-like Structured DCIS Cells Basal-like Structured DCIS Cells
4 ApocrineProgram Apocrine Cells Apocrine Cells
5 LuminalAmorphousProgram Luminal-like Amorphous DCIS Cells CAFs, DCIS Associated
pathway retained_cell_count retained_quantile activity_mode
0 ApocrineProgram 1896 0.95 intrinsic
1 BasalDCISProgram 1876 0.95 intrinsic
2 LuminalAmorphousProgram 4724 0.95 intrinsic
3 MacrophageProgram 2164 0.95 intrinsic
4 PlasmaProgram 1808 0.95 intrinsic
5 VascularProgram 3186 0.95 intrinsic

Core workflow#

The packaged Atera workflow computes both pathway views in one pass so the cell-type distances and activity-derived point clouds can be compared directly.

from pyXenium.validation import run_atera_wta_breast_topology

study = run_atera_wta_breast_topology(
    dataset_root=str(ATERA_DATASET_PATH),
    tbc_results=str(TBC_RESULTS_PATH),
    output_dir="./atera_pathway_outputs",
    export_figures=True,
)

pathway_to_cell = study["pathway"]["pathway_to_cell"]
pathway_activity_to_cell = study["pathway"]["pathway_activity_to_cell"]

The notebook output below reuses the committed bundle to keep RTD builds fast while still showing the real pathway story.

if RUN_FULL_ANALYSIS and ATERA_DATASET_PATH.exists():
    from pyXenium.validation import run_atera_wta_breast_topology

    study = run_atera_wta_breast_topology(
        dataset_root=str(ATERA_DATASET_PATH),
        tbc_results=str(TBC_RESULTS_PATH),
        output_dir=str(ARTIFACT_DIR),
        export_figures=True,
    )
    display(study["pathway"]["pathway_mode_comparison"].head())
else:
    display(Markdown("Set `RUN_FULL_ANALYSIS = True` to recompute the Atera pathway bundle from the local Xenium export."))

Set RUN_FULL_ANALYSIS = True to recompute the Atera pathway bundle from the local Xenium export.

Visual outputs#

These heatmaps compare the primary gene-topology aggregate view with the activity point-cloud view. They answer slightly different questions: one asks which cell types are closest to a pathway’s member genes in topology space, while the other asks where high-activity cells physically accumulate.

display(Image(filename=str(ARTIFACT_DIR / "figures" / "pathway_to_cell_heatmap.png")))
display(Image(filename=str(ARTIFACT_DIR / "figures" / "pathway_activity_to_cell_heatmap.png")))
display(Image(filename=str(ARTIFACT_DIR / "figures" / "pathway_hotspot_overlay.png")))
../_images/bcc11a4f8d42577fd31d72ea52c92b0e345e6a722e816c90a773253fa9403360.png ../_images/debebf97db60211ff3611187875225ce08c96a76f04e1455a326ae9f5e5cedd5.png ../_images/41f8fd31081a312a49a3b66740c7ffbb1e716a4ad99f5fcaa4e528233c334aef.png

Biological interpretation#

The Atera pathway bundle resolves a coherent set of tissue programs: macrophage-associated genes map toward macrophage-rich compartments, vascular programs align with endothelial or perivascular niches, and DCIS-related programs remain anchored in structured epithelial regions. The activity point-cloud view is particularly useful when pathway activation is concentrated in a subset of cells rather than evenly spread across a lineage.

Caveats#

  • Pathway definitions are curated smoke panels, so they are intentionally small and hypothesis-driven.

  • The best cell type in the gene-topology view and the best cell type in the activity point-cloud view do not have to match.

  • Retained-cell thresholds in the activity view influence hotspot shape and should be reported whenever figures are compared across studies.

Next steps#

  • Revisit the cci notebook if you want to connect pathway programs to explicit sender-receiver pairs.

  • Replace the default pathway panel with a custom pathway table when you have a cohort-specific hypothesis.

  • Compare intrinsic and niche-smoothed pathway activity modes when spatial spillover is biologically plausible.