Evaluating microRNA-target prediction across organisms using machine learning
Regulation of gene expression is fundamental for a proper development, homeostasis, and adaptation to the environment for all living organisms. MicroRNAs (miRNAs) are small RNA molecules that play an important role in post-transcriptional gene regulation. This class of short, non-coding RNAs can hybridize to complementary sequences on target mRNAs and repress their translation or mediate their degradation. Identifying miRNA target sites on mRNAs is a fundamental step in understanding miRNA function. A computational prediction of miRNA targets is very challenging, due to the fact that miRNAs are short and engage only a limited sequence complementarity to their targets.
In recent years, we have witnessed tremendous progress in the ability to unambiguously identify bona-fide miRNA-target interactions by high-throughput experimental methods (e.g., CLASH). In these methods thousands of direct (chimeric) miRNA-target interactions are recovered, providing an unprecedented opportunity to apply machine learning methods to study the characteristics of miRNA target sites and to improve miRNA target site prediction.
We have collected existing datasets of chimeric miRNA target interactions, from variety of organisms e.g., human, mouse, cow and worm. We extracted raw and high-level features that represent the hybrids formed by the miRNA-target pairs, and fitted several machine learning models to predict miRNA target interactions. We have evaluated the performance of the models when trained and tested on the same organism or different ones. Our study provides important insights on the key features of miRNA-target interactions and the transferability of miRNA-targeting rules between different organisms.