Insights into chromosome structure, transposable elements and secondary metabolite gene clusters from the complete genome sequence of Colletotrichum higginsianum
The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease on many cruciferous plants, including the model plant Arabidopsis thaliana. Previous versions of the genome sequence based on short-read data from 454 and Illumina sequencing were highly fragmented, causing errors in gene prediction and preventing the analysis of repeats and genome architecture. By combining the long reads from single-molecule real-time sequencing together with optical mapping, we obtained a highly contiguous assembly where all 12 chromosomes are sequenced telomere to telomere without gaps except for one region containing the rDNA repeats. The more accurate gene annotation based on this new assembly provided a comprehensive inventory of secondary metabolism-related genes corresponding to 77 putative biosynthetic pathways, suggesting a large capacity for chemical diversity. Similar to the conditionally dispensable chromosomes of other plant pathogenic fungi, the two mini-chromosomes differed markedly from the core genome in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Annotation of transposable elements (TEs) revealed that certain TE families showed a statistically significant association with effector genes and secondary metabolism gene clusters and were transcriptionally active at particular stages of fungal development. The complete genome assembly also enabled us to identify chromosome segmental duplications. Four of these were associated with highly-conserved subtelomeric repeats, which may provide sites for homologous recombination. Repeat-mediated segmental duplication may thus be a mechanism for generating genetic diversity in this asexual fungus.