ILANIT 2023

Improved prediction of CRISPR-Cas9 on-target efficiency by epigenetics

Michal Rahimi 1 Yaron Orenstein 1,2
1Department of Computer Science, Bar-Ilan University, Israel
2The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Israel

CRISPR-Cas9 has revolutionized the field of gene editing in recent years. By designing 20nt-long guide RNAs (gRNAs), one can target any genomic loci that is followed by NGG. But different gRNAs have different on-target efficiencies and measuring all potential gRNAs experimentally requires too much time and resources. Thus, researchers turn to computational methods to predict the on-target efficiency of a specific gRNA, where most methods are based on machine-learning models trained to predict on-target efficiency from the gRNA sequence. Yet, much of the on-target efficiency is based on the target epigenetics rather than the gRNA sequence, and only few methods were developed to consider epigenetics information due to scarcity of endogenous editing data. Recently an endogenous dataset of 1,555 gRNAs in T cells became available providing the opportunity to fill this gap. In this study, we developed a novel method to predict on-target efficiency based on both the gRNA sequence and epigenetics information. We trained a neural network to receive as input a 60nt-long genomic sequence centered around the gRNA together with three epigenetic marks: chromatin accessibility, CTCF binding, and H3K4me3. Our model achieved in 5-fold cross-validation a Spearman correlation of 0.41 compared to 0.34 when considering only the sequence information, underscoring the importance of epigenetics information with chromatin accessibility being the most informative. To conclude, we demonstrated the improvement gained by including epigenetics information in on-target efficiency prediction. In the future, we plan to extend our work to improve prediction of on-target repair outcomes by similar epigenetics information.