HIV infections typically begin with only a single founder virus entering the blood stream. In the following weeks the invading virus will replicate to extremely high numbers, in the order of 1010. At this time point, the virus will generate the initial diversity ("quasi-species") that will allow it to evade the immune system and resist therapies. Next generation sequencing (NGS) has revolutionized our ability to observe a population of viruses within a single host, but is limited by the error rate of the amplification and sequencing process at roughly 2%. Therefore, studies that sequenced HIV at acute infections found almost no observable diversity.
Here, we present AccuNGS, an experimental protocol and computational software for RNA and DNA sequencing that reliably identifies variants from clinical samples present at frequencies as low as 0.01%. We sequence HIV from acute infection and reveal that nearly 40% of all sequenced positions have observable minor variants, present at 0.1-0.01%. We demonstrate that the rare variants we identify are in line with the expected properties of the virus population at acute infection. Strikingly, we find a significant enrichment for G->A variants, a strong evidence for the innate antiviral activity of the host APOBEC family of DNA-editing enzymes. We hope our protocol will revolutionize the clinical NGS process.