Deficiencies in the mismatch repair (MMRd) pathway, lead to an increased rate of insertions and deletions in short repetitive regions of the genome called microsatellites (MS). Research on MMRd, to this day, has focused on single nucleotide variants (SNVs) and short deletions (<10 bases). The effect of MMRd on longer deletions has not been thoroughly analyzed, and in addition, its effect on minisatellites - repetitive regions with more than 6 bases, is not wholly understood. This knowledge gap is largely due to the lack of computational tools that can analyze these larger events.
We developed a computational tool that uses next generation sequencing data to analyze long deletions in MS-loci. We then applied it to a large set of whole genome sequencing data with both MMRd (~200) and MMR proficient (MMRp) (~100) cases.
We found that even in minisatellites loci, MMRd tumors have an increased indel rate compared to MMRp samples, with a peak at 8 bases repeat units. This suggests that the MMR pathway is most efficient at repairing this indel size.
Most surprising, while it is known from the literature that the MSH6 encoded subunit, identifies SNVs of short (1b-2b) deletions, and MSH3 identifies deletions of length 1b-15b, we found that tumors with MSH6 deletions had the same pattern of indels including an increased rate of long deletions. Therefore, the model of the MMR pathway that originated from the analysis of E.coli and Yeast, may describe only part of the MMR mechanism in humans.