Whole-dataset CCI benchmarking#
Overview#
This tutorial summarizes whole-dataset cell-cell interaction benchmarking on the full
Atera Xenium WTA breast sample (170,057 cells). The clean PDC full_common
runs used a shared human cell-cell interaction resource so that methods are compared
by recovered biology and method-internal rank behavior, not by raw score
magnitude.
The benchmark is separate from the basic cell-cell interaction tutorial, which focuses
on the fixed smoke/topology panel and the pyXenium.cci workflow.
Completed full common-db methods#
Method |
Full common-db rows |
Highest-level signal recovered |
|---|---|---|
|
1,319,600 |
Vascular and topology-supported stromal axes, led by |
|
1,183,456 |
Reproducible non-spatial expression baseline, led by |
|
1,304,935 |
Diffusion-smoothed tumor-stroma signals, led by |
|
744,209 |
Spatial bivariate signals including |
|
446,023 |
Spatial co-expression signals dominated by tumor-intrinsic epithelial interactions such as |
|
505,281 |
Local neighborhood CCI signals dominated by tumor-intrinsic high-expression interactions such as |
Canonical Atera axis recovery#
Canonical Atera axis |
Benchmark recovery |
|---|---|
|
Strongly recovered. |
|
Strongly recovered by |
|
Recovered by multiple methods, but with method-dependent receiver compartments. |
|
Not recovered in the clean full common-db outputs; this should be interpreted as a database/expression/filtering limitation rather than proof that the macrophage axis is absent. |
|
Not recovered in the clean full common-db outputs, again suggesting panel detectability or common-resource limitations. |
Biological interpretation#
Overall, pyXenium gave the strongest topology-supported biological discovery
profile because it recovered the expected CXCL12-CXCR4 and DLL4-NOTCH3
axes with the most anatomically plausible sender-receiver assignments.
CellPhoneDB is the most useful reproducible non-spatial baseline, and
LARIS is a strong diffusion-aware complement.
SpatialDM and stLearn are best read as supplementary spatial co-expression
methods in this dataset because their top ranks are dominated by tumor-intrinsic
high-expression programs. LIANA+ produced biologically interesting spatial
bivariate hits, but the strongest calls require caution because several involve
the Unassigned compartment.
Caveats#
Scores are standardized within each method; raw scores are not directly comparable across methods.
This page reports clean PDC
full_commonoutputs only, not stale A100 salvage runs or smoke-only results.Non-recovery of a canonical axis in the common-db benchmark can reflect CCI resource coverage, ligand-receptor-resource filtering, or panel detectability rather than biological absence.
Next steps#
Use
pyXeniumwhen topology-supported biological discovery is the priority.Use
CellPhoneDBas the reproducible non-spatial expression baseline.Use
LARIS,SpatialDM,stLearn, andLIANA+as complementary views whose discoveries should be interpreted through their method-specific assumptions.Extend the clean benchmark to the registered Atera cervical Xenium WTA dataset (
atera_cervical_wta) and one public non-Xenium spatial dataset before using these results as manuscript-level cross-dataset evidence.For reviewer-facing TopoLink-CCI results, report
CCI_scoreas the discovery score and usecci_pvalue,cci_fdr, spatial nulls, matched-gene controls, downstream support, and bootstrap stability as orthogonal validation evidence.