Hidden Genomic Code for Protein Recognition?

In my talk I will reveal that nonconsensus, statistical, DNA triplet code provides the specificity for the positioning of the key transcription regulatory complex, the human pre-initiation complex (PIC). In particular, we reveal a highly non-random, statistical pattern of repetitive nucleotide triplets that correlates with the genome-wide binding preferences of PIC measured by Chip-exo. We analyze the triplet enrichment and depletion near the transcription start site (TSS) and identify triplets that have the strongest effect on PIC-DNA nonconsensus binding. In addition, using statistical mechanics, random-binder model without fitting parameters, with genomic DNA sequence constituting the only input, we further validate that the nonconsensus nucleotide triplet code constitutes a key signature providing PIC binding specificity in the human genome. Our results constitute a proof-of-concept for a new design principle for protein-DNA recognition in the human genome, which can lead to a better mechanistic understanding of transcriptional regulation.