The strength of local mRNA folding can be increased or decreased at different regions of the CDS (coding sequence) through the choice of synonymous codons. This modulation of local folding strength affects the interaction with the ribosome and is thought to influence many additional aspects of gene expression.
To measure the effect of selection on mRNA structure strength, we performed a genome-wide computational study of 513 genomes across the tree of life, comparing folding strengths in native CDSs to a null model which maintains amino acid content, GC-content, and codon distributions. Using this method we found that selection on mRNA structure in most species changes direction predictably along the CDS, with weak folding near the CDS edges and strong folding in the mid-CDS region.
Then, based on a phylogenetic tree of the analyzed organisms, we performed regression analysis while controlling for the expected similarities between traits in evolutionary-related species, to detect significant interactions with genomic and environmental traits that may be causally linked to mRNA folding strength. We found significant correlations with GC-content, codon usage bias and additional factors and these correlations also apply differently near the CDS edges than in the mid-CDS region. This recurring structure shows there are disparate requirements for mRNA folding at different stages of gene expression and specifically during translation.
These results should advance our understanding of the effect of mRNA secondary structure on gene expression and fitness in any species, and allow developing novel modeling and engineering approaches for controlling gene expression.