ILANIT 2020

Disease interpretation of non-coding RNA genes

Ruth Barshir Simon Fishilevich Daniel Malawsky Michal Twik Tsippi Iny Stein Marilyn Safran Doron Lancet
Molecular Genetics, Weizmann Institute of Science, Israel

Non-coding RNA (ncRNA) genes, amounting to >20% of the non-protein-coding genome, assume increasing importance, with accumulating evidence of involvement in disease mechanisms. A comprehensive non-redundant gene-centric view of human ncRNA genes and their annotations is critical for understanding their involvement in health and disease, and for whole genome sequencing (WGS) interpretation.

GeneCards, the widely used human gene database, is integrating data of ncRNA records and annotations from EBI’s RNAcentral, and its 15 primary sources, as well as from Ensembl, NCBI Entrez Gene and HGNC. Many ncRNA sources are transcript-centric, and for WGS variant analysis it is essential to transform this information into a unique and all-encompassing compendium of ncRNA genes. Utilizing the capacities of GeneCards, being gene-centric, we clustered overlapping transcript entries from all sources using an algorithm based on genomic coordinates. This strategy eliminates redundancies and allows for full coverage of genomic gene positions. Adhering to GeneCards strategy, each RNA gene, with affiliated transcripts, appears with annotations from all relevant sources. The unified compilation includes more than 180,000 ncRNA gene entries, including ~100,000 Piwi-interacting-RNAs.

VarElect, GeneCards’ NGS interpretation tool [PMID 27357693], allows disease interpretation of variants mapped to ncRNAs by two routes. First, direct ncRNA-disease associations are extracted from our knowledgebase, now augmented by novel ncRNA-disease connections. Second, our strengthened capacity to associate ncRNAs with protein-coding genes now enables interpretation of poorly characterized ncRNAs. The combination of an extensive ncRNA knowledgebase with gene interpretation algorithms allows fathoming the disease-related significance of non-coding variants identified by WGS.









Powered by Eventact EMS