pyXenium.cci Tutorial#

Overview#

This notebook walks through the Atera WTA breast topology reproducibility bundle and focuses on how pyXenium.cci turns precomputed topology anchors plus cell-type expression support into interpretable sender-receiver hypotheses.

Biological question#

Which cell populations appear to drive the strongest cell-cell interaction communication programs across tumor, stromal, immune, and vascular compartments in the Atera breast sample?

from __future__ import annotations

import json
import os
import sys
from pathlib import Path

import pandas as pd
from IPython.display import Image, Markdown, display


def find_repo_root() -> Path:
    for candidate in (Path.cwd(), *Path.cwd().parents):
        if (candidate / "pyproject.toml").exists():
            return candidate
    raise RuntimeError("Could not locate the pyXenium repository root.")


REPO_ROOT = find_repo_root()
SRC_ROOT = REPO_ROOT / "src"
if str(SRC_ROOT) not in sys.path:
    sys.path.insert(0, str(SRC_ROOT))

pd.set_option("display.max_columns", 20)
pd.set_option("display.max_rows", 12)

ATERA_DATASET_PATH = Path(
    os.environ.get(
        "PYXENIUM_ATERA_DATASET",
        r"Y:\long\10X_datasets\Xenium\Atera\WTA_Preview_FFPE_Breast_Cancer_outs",
    )
)
TBC_RESULTS_PATH = ATERA_DATASET_PATH / r"sfplot_tbc_formal_wta\results"
ARTIFACT_DIR = REPO_ROOT / "manuscript" / "atera_wta_breast_topology"
RUN_FULL_ANALYSIS = False

ATERA_DATASET_PATH, TBC_RESULTS_PATH, ARTIFACT_DIR

(WindowsPath('Y:/long/10X_datasets/Xenium/Atera/WTA_Preview_FFPE_Breast_Cancer_outs'),
 WindowsPath('Y:/long/10X_datasets/Xenium/Atera/WTA_Preview_FFPE_Breast_Cancer_outs/sfplot_tbc_formal_wta/results'),
 WindowsPath('D:/GitHub/pyXenium/manuscript/atera_wta_breast_topology'))

Dataset#

Raw study: Atera WTA FFPE breast Xenium export with precomputed t_and_c and StructureMap anchors.
Versioned outputs in this repository: manuscript/atera_wta_breast_topology/.
Canonical API: cci_topology_analysis and the packaged run_atera_wta_breast_topology(...) workflow.

Setup#

The notebook renders committed CSV and figure artifacts generated from a real run, and it keeps a rerun cell for the full workflow when local data are available.

payload = json.loads((ARTIFACT_DIR / "summary.json").read_text(encoding="utf-8"))
scores = pd.read_csv(ARTIFACT_DIR / "cci_sender_receiver_scores.csv")

top_pairs = (
    scores.sort_values("CCI_score", ascending=False)
    .groupby(["ligand", "receptor"], as_index=False)
    .first()
    [["ligand", "receptor", "sender_celltype", "receiver_celltype", "CCI_score", "local_contact"]]
    .head(10)
)

display(top_pairs)
display(pd.DataFrame(payload["cci_acceptance"]))

	ligand	receptor	sender_celltype	receiver_celltype	CCI_score	local_contact
0	CSF1	CSF1R	CAFs, DCIS Associated	Macrophages	0.507387	0.044743
1	CXCL12	CXCR4	CAFs, DCIS Associated	T Lymphocytes	0.633882	0.168430
2	DLL4	NOTCH3	Endothelial Cells	Pericytes	0.662811	0.135465
3	JAG1	NOTCH1	11q13 Invasive Tumor Cells	Basal-like Structured DCIS Cells	0.502909	0.073062
4	TGFB1	TGFBR2	Endothelial Cells	Endothelial Cells	0.529126	0.051665

	check	ligand	receptor	observed_top_sender	pass	observed_rank	observed_top_receiver
0	CSF1-CSF1R top sender should not be Mast Cells	CSF1	CSF1R	CAFs, DCIS Associated	True	NaN	NaN
1	CXCL12-CXCR4 should keep CAFs, DCIS Associated...	CXCL12	CXCR4	NaN	True	1.0	NaN
2	DLL4-NOTCH3 top hit should be Endothelial Cell...	DLL4	NOTCH3	Endothelial Cells	True	NaN	Pericytes

Core workflow#

A standard rerun goes through the packaged validation entrypoint so that the same cell-cell interaction panel and topology anchors are used each time.

from pyXenium.validation import run_atera_wta_breast_topology

study = run_atera_wta_breast_topology(
    dataset_root=str(ATERA_DATASET_PATH),
    tbc_results=str(TBC_RESULTS_PATH),
    output_dir="./atera_cci_outputs",
    export_figures=True,
)

The committed notebook output below reuses the versioned manuscript bundle so the page stays lightweight on Read the Docs.

if RUN_FULL_ANALYSIS and ATERA_DATASET_PATH.exists():
    from pyXenium.validation import run_atera_wta_breast_topology

    study = run_atera_wta_breast_topology(
        dataset_root=str(ATERA_DATASET_PATH),
        tbc_results=str(TBC_RESULTS_PATH),
        output_dir=str(ARTIFACT_DIR),
        export_figures=True,
    )
    display(pd.DataFrame(study["payload"]["cci_pair_summaries"]))
else:
    display(Markdown("Set `RUN_FULL_ANALYSIS = True` to recompute the Atera cell-cell interaction bundle from the local Xenium export."))

Set RUN_FULL_ANALYSIS = True to recompute the Atera cell-cell interaction bundle from the local Xenium export.

Visual outputs#

The summary heatmap collapses pairwise communication scores across sender-receiver compartments, while the hotspot overlay shows where the strongest local interaction pattern sits in tissue space.

display(Image(filename=str(ARTIFACT_DIR / "figures" / "cci_summary_heatmap.png")))
display(Image(filename=str(ARTIFACT_DIR / "figures" / "cci_hotspot_overlay.png")))

../_images/e2f9c71ff05ee13fb69f1d9cef686890b4abae6a63d368ae50568828ce80355b.png

../_images/fda05e7a01ccae8ccbfb14532cb699ad8ed411465678040ed0d897bcd2ad0993.png

Biological interpretation#

The highest-ranking pairs in the committed Atera bundle reinforce a biologically mixed tissue architecture rather than a single tumor-autonomous program. Vascular signaling, stromal support, and immune-facing signals remain prominent because topology anchors reward both expression support and the spatial bridge between compartments. The hotspot map is especially useful when a strong pair would otherwise be dismissed as pseudobulk co-expression without local tissue evidence.

Caveats#

The score is a composite; a strong hit can be driven by anchor quality, expression support, and local contact in different proportions.
This notebook uses the fixed smoke-panel pairs from the Atera reproducibility workflow, not a whole-database cell-cell interaction scan.
Precomputed topology anchors should be interpreted as study-specific spatial priors, not universal cell-type distances.

Next steps#

Open the pathway notebook to compare communication programs with pathway-level topology on the same Atera sample.
Inspect cci_component_diagnostics.csv when you need to understand why a pair ranked highly.
Swap in a custom interaction_pairs table if you want to test a focused biological hypothesis beyond the default smoke panel.