Biomass is considered an attractive and immediate source for liquid biofuel that do not contribute net CO2 to the atmosphere. The hydrolysis of lignocellulose to soluble sugars that can be fermented to ethanol involves a costly enzymatic step. Lignocellulose hydrolysis requires the synergistic actions of several enzymes termed glycoside hydrolases (GHs). In the framework of this research, metagenomics and biochemical approaches are combined to isolate and characterize novel enzymatic systems geared for lignocellulose hydrolysis and biofuel production.
Our view on the GHs world is restricted to the current known sequences available in databases. Thus, when analyzing a metagenome novel GHs or low sequence similarities genes are usually slip under the radar of the conventional search methods.
Our sequence based screening design to identify novel GHs genes in metadata. We developed our search algorithm under three principles:
A. GHs designated algorithm rather than a full data analysis.
B. Genomic Neighborhood (GN) approach assimilation for sieving excess data.
C. Phylogenic relation based algorithm for putative genes evaluation.
The search is done with a designated hidden Markov models (HMM) profiles dataset. The GN approach searches metagenomic databases for putative GH sequences located in gene clusters related to lignocellulose degradation pathways. The phylogenic relation based sensitive search aims to identify distanced GH homologous sequences with low sequence similarity based on phylogenetic relativity. In our preliminary results the GN search approach on three nodes of a human gut microbial metagenome had yielded 21 putative novel GH sequences. The profile based search had identified sequences from several GH families. In a preliminary analysis of end-sequenced metagenome from the Mediterranean Sea we found over 500 sequences possessing at least one gene from our profiles dataset and twenty one sequences with cellulolytic genes such as cellulases and cellulose binding modules. A comprehensive search algorithm consists of several search tools is proposed.