Synthesis of short RNA primers performed by DNA primase serves to initiate DNA synthesis of ‘Okazaki fragments’ (OF) on the lagging DNA strand by DNA polymerase. The binding of DNA primase to the DNA marks the start sites of OF, which occurs at a specific trinucleotide recognition sequence on the DNA.
Using high throughput primase profiling (HTPP), we determined a comprehensive primase-DNA binding profiling over a wide range of binding affinities. Using clustering algorithms, we found distinct populations for primase binding profiles based on features embedded in the DNA sequences. A subsequent application of classification algorithms enables the prediction of better primase DNA recognition sequences. Such prediction allows the identification of DNA sequences that serve as useful DNA recognition sequences. The DNA sequences we found not only yield active binding, one that results in RNA primers synthesis, but also recruit DNA polymerase to elongate the primer into a functional Okazaki fragment.
This study comprises the first example for the use in ML algorithms to analyze data of protein-DNA microarray, which are approached, usually using rudimentary statistical techniques.