Nanopore sequencing is a third-generation approach used mainly for the sequencing of DNA or RNA. In nanopore sequencing, an ionic current passes through nanopores and the changes in current are measured, as biological molecules (such as DNA) pass through the nanopore. The information about the changes in current can be used to identify the molecules.
A main challenge of existing nanopore base-callers is that they rely on a full training set in order to call
for a selected DNA modification. For 5mC this is quite trivial since you can use a bacterial
methyltransferase to introduce a methyl group at every CpG thus creating a full training set for all
kmers. This is also the reason 5mC is the only modification that is reliably called by nanopore
sequencing.
We found that there is a Difference in the average current between every modified Cytosine compared to the unmodified Cytosine in the same k-mer. This proves that the modifications affect the ionic current that passes through the pore and that this information can be used in order to detect these modifications.
We created a simple random forest model that takes into account shifts in current, dwell-time in the pore
and deviation from the standard model in order to detect four Cytosine modification.
Finally, we sequenced DNA extracted from mouse brain which naturally contains high levels of modifications. The results show good correlation in modifications detection when comparing to existing methods such as Megalodon, remora and TAB-seq.