Conclusion
Hopefully this tutorial gave you some insights into how viral amplicon sequence data are analyzed! This is the exact general workflow that was used to produce over 16 million SARS-CoV-2 genome sequences across the world, which were immensely helpful for tracking the spread of COVID-19.
We happened to use the ViralConsensus pipeline produced by Niema Moshiri at UC San Diego, but in practice, you may want to see how swapping out tools for each of these steps change the runtime and results of the analysis. If you’re curious in learning about how different stages of this pipeline could be modified, check out Moshiri et al., Scientific Reports 2022.