ILANIT 2023

Analyzing genomic contexts of genes in metagenomic bacterial communities using de novo assembly

Netta Barak 1 Amir Erez 2 Moran Yassour 1,3
1The Rachel and Selim Benin School of Computer Science and Engineering, The Hebrew University of Jerusalem,, Israel
2Racah Institute of Physics, The Hebrew University of Jerusalem, Israel
3Faculty of Medicine, The Hebrew University of Jerusalem, Israel

Advances in metagenomics, namely sequencing and analyzing the entire DNA content of microbial populations, have driven great progress in the ability to study microbial communities. While it is common to use metagenomic sequencing reads to characterize the taxonomic profile and identify the genes found in a sample, associating genes with their host species remains challenging as the sequencing fragments are too small to capture this information.

Here, we present GInGeR - a Genes of Interest Genomic Regions analysis tool based on de novo metagenomic assembly. GInGeR locates the genes of interest in the assembly graph, identifies their potential genomic contexts in the graph and uses a reference genomes database to verify the contexts and assign them to carrier species. We demonstrated GInGeR’s performance using a set of 8,479 genes found in the ZymoBIOMICS Microbial Standard mock community, using the Unified Human Gastrointestinal Genome collection as a reference genomes database.

GInGeR successfully identified the correct genomic contexts of the genes of interest with precision of 0.966 and recall of 0.942. It also managed to accurately associate the genes with the correct carrier species, obtaining precision of 0.988 and recall of 0.922. In addition, GInGeR performs similarly well on shallower sequencing data and adds negligible run time compared to the assembler’s run time.

GInGeR introduces a new way for analyzing metagenomic samples and showcases the benefits of utilizing the information available in de novo metagenomic assembly graphs for studying bacterial communities.