Use LSD2 and dates file to generate a rooted tree
In the previous step we created an unrooted phylogenetic tree through Phylogenetic Inference. However, because we have access to the collection of dates of the sequences in our SARS-CoV-2 dataset, we can “root” the tree (find the most likely position of the MRCA) and “date” the tree (scale the branch lengths to be in units of time). We will use LSD2 to generate a rooted tree, and we will use an outgroup to help us do this. Known organisms that are distantly related to the species of interest can act as outgroups (i.e. references) when inferring a rooted tree, which can help us perform more accurate rooting and dating. In our case, we will use a RaTG13 bat coronavirus sequence as our outgroup.
Try to take a look at the usage instruction of LSD2.
Now, to generate our rooted tree:
The above command incorporates the following flags:
-i specifies the input file, which is our unrooted phylogenetic tree from Step 2-d specifies the file with sequences dates, which is essential for rooting-g specifies the file with outgroup sequences-G removes the outgroups from the tree (uses it to root, but does not show it on the tree)-o specifies the name of our output filelsd2_out.nwk. Like in Step 2, we can view the first 10 lines of the Newick file at the command line with:Take a look at the lsd2_out log file. When did the MRCA exist?
WHO declared COVID-19 a pandemic on March 11, 2020. Does our MRCA date to before or after this day?
We used 10 SARS-CoV-2 sequences to generate this rooted tree. Which statement is true about this approach?