Selecting appropriate preclinical models is fundamental for translational oncology, yet a large-scale, multi-omic quantitative comparison of their similarity to primary human tumors is lacking. To address this, we integrated transcriptomic, proteomic, and genomic profiles from over 10,000 primary tumors from The Cancer Genome Atlas (TCGA) and the Clinical Proteomic Tumor Analysis Consortium (CPTAC), alongside 4,000 preclinical models. Using a robust computational framework, we revealed a clear hierarchy of transcriptomic and proteomic similarity to patient tumors: patient-derived xenografts (PDXs) > patient-derived organoids (PDOs) = PDX-derived organoids (PDXOs) > cell lines. We also quantified high molecular conservation (Pearson correlation coefficient = 0.96) across paired in vitro to in vivo platform (organoids to PDX) transitions. Furthermore, genomic analysis demonstrated that whole-exome sequencing (WES) outperforms RNA sequencing (RNA-Seq) in detecting DNA variants, and it identified a clonal complexity hierarchy (cell lines > PDXOs > PDXs > PDOs) reflecting the impact of passaging history on intra-tumor heterogeneity. Ultimately, this study delivers a comprehensive quantitative benchmark, establishing a population-level hierarchy of molecular similarity between preclinical models and primary tumors, and providing a data-driven reference for model selection. These findings offer a data-driven framework for selecting models that balance biological representativeness with experimental practicality.
Zixuan Xie, Jia Xue, Binchen Mao, Hengyuan Liu, Wubin Qian, Jingjing Wang, Xiaobo Chen, Sheng Guo
Usage data is cumulative from June 2026 through July 2026.
| Usage | JCI | PMC |
|---|---|---|
| Text version | 145 | 0 |
| 48 | 0 | |
| Supplemental data | 11 | 0 |
| Citation downloads | 20 | 0 |
| Totals | 224 | 0 |
| Total Views | 224 | |
Usage information is collected from two different sources: this site (JCI) and Pubmed Central (PMC). JCI information (compiled daily) shows human readership based on methods we employ to screen out robotic usage. PMC information (aggregated monthly) is also similarly screened of robotic usage.
Various methods are used to distinguish robotic usage. For example, Google automatically scans articles to add to its search index and identifies itself as robotic; other services might not clearly identify themselves as robotic, or they are new or unknown as robotic. Because this activity can be misinterpreted as human readership, data may be re-processed periodically to reflect an improved understanding of robotic activity. Because of these factors, readers should consider usage information illustrative but subject to change.