The process of translation initiation in prokaryotes involves binding of the 16S rRNA component of the small ribosomal subunit to the mRNA ribosome-binding site. The prokaryotic mRNA ribosome-binding site usually contains part or all of a polypurine domain known as the Shine – Dalgarno (SD) sequence that can be found upstream to start codon and enables initiation of translation. Furthermore, SD sequences inside the coding region may affect the process of translation elongation as they may cause pauses during translation elongation. Pausing of ribosomes may affect a variety of phenomena including ribosomal traffic jams, folding, protein targeting and ribosomal abortion. Therefore the distribution of SD sequence along transcripts is expected to affect bacterial synthetic and endogenous gene expression and bacterial growth rates.
In this study, we have developed a computational pipeline that can be used for inferring evolutionary selection for SD sequence and SD-like sequences across the transcriptome. In a genome-wide analysis on Escherichia coli, our findings indicate among others that there is a significant selection against SD sequence and SD-like sequences in the coding sequence and a significant selection for SD sequence and SD-like sequences in specific locations in the 5’ UTR. These rules can be used when optimizing expression levels of synthetic and endogenous genes in bacteria for various biotechnological objectives.