MicroRNAs (miRNAs) are short (~22nt) non-coding single-stranded RNA strands with a vital role in regulating cellular mRNA and protein levels. These Small RNAs bind, as part of a larger silencing complex, to complementary nucleotide sequences (called "target sites") in the mRNA strand. They mainly repress protein translation and facilitate mRNA degradation, but have been suggested to have an upregulation function as well. Their central role in regulation, as well as their significant link to diseases make them a high-profile target for in-silico and in-vivo research.
Due to the relative scarceness of reliable data, progress in the field has been incremental. Here, we developed a computational model, implementing ~30 features from four main categories (thermodynamics, evolutionary conservation, sequence and codon composition), in order to both predict miRNA repression magnitude and better understand the underlying repression mechanism based on the mRNA and miRNA nucleotide composition. The model was trained and evaluated, based on a dataset consisting of around 4,000 Homo sapiens genes expression following over-expression of 74 miRNAs. We demonstrate the performances of the model, among others, via the novel analysis of miRNA target sites in different parts of the transcripts. Our analysis emphasizes the importance of gene expression regulation via miRNA binding sites inside the coding regions and not only in the 3`UTR.