NEMO: Cancer Subtyping by Multi-Omics Integration - EACR-AACR-ISCR Conference 2018

Introdcution

Recent technological advances have facilitated the production of large high throughput biological data types, collectively termed "omics". These include genomics, transcriptomics, proteomics and many more.

The large, diverse omics data available today can be used to characterize human disease better, and to help physicians treat patients in a more personalized way. Analysis of large datasets has led to the discovery of novel cancer subtypes, and classification of tumors into these subtypes is now used in treatment decisions. However, these subtypes are usually defined through the use of a single omic (e.g. gene expression). Using multiple omics for cancer subtyping will allow us to better understand cancer biology, and to suggest more effective and precise therapy.

Methods

We have developed NEMO (NEighborhood based Multi Omic clustering), a novel algorithm for cancer subtyping through integration of several omic datasets. By using similarities between patients, the algorithm can handle diverse omics without having to model each omic separately, and can support omics with hundreds of thousands of measurements per patient.

NEMO is fast and simple, and has the added advantage of handling missing data, i.e. it can include in the analysis patients for whom not all types of data are available.

Results

We have tested NEMO on data of several cancer types from The Cancer Genome Atlas (TCGA), using gene expression, DNA methylation and microRNA expression. NEMO partitions the tumors into groups that are distinctive in terms of survival / prognosis patterns as well as other clinical parameters, even in missing data situations. The results match state-of-the-art algorithms on full datasets, and show an improvement over extant algorithms that handle missing data.

Conclusion

NEMO provides novel partitions of cancers into subtypes. Those partitions show different prognosis between patients, and may therefore be used to suggest a more precise treatment. The algorithm`s ability to handle partial datasets allows full utilization of data available in existing large scale datasets, and enables researchers to design more cost-effective experiments.