We present a new algorithm for assembly of full plasmid sequences from metagenomic data. Our starting point is our previously published tool Recycler (Rozov et al., Bioinformatics 2016). It attempts to find uniformly covered cyclic paths that are predicted to be plasmids in the metagenomic assembly graph. Candidate paths are found efficiently using a shortest paths algorithm that finds for each edge in the graph a high scoring, highly covered cycle through that edge.
Here we improve the algorithm in three key ways: (1) we incorporate into cycle scores the prior probability that each sequence is of plasmid origin; this score is computed based on plasmid classification tools and plasmid-specific genes. (2) We search more extensively for uniformly covered cycles instead of just the most highly covered ones. (3) We construct a curated set of plasmid-specific genes that we use to identify sequences of likely plasmid origin and attempt to incorporate them into cycles. We also filter out potential plasmids that do not contain any plasmid-specific genes.
We show improved plasmid assembly on simulated metagenomic datasets and in real human gut microbiome samples in comparison to Recycler and to the metaplasmid version of the SPAdes assembler.