Building trees with SKA
Help 6 / 7
Working with references

In addition to the reference-free alignments, SKA can also be used to produce alignments against a reference genome. As an example, let’s use a reference genome for E. coli, it is already downloaded for you, but you can download it in your computer from this ENA entry if you want.

To do so, we need to create a new alignment, but adding the reference to it. We have provided another input file list, assemblies/ska_input_ref.tsv, that replaces one of the previous entries with the reference, so let’s run again the alignment with:

ska build -f assemblies/ska_input_ref.tsv -k 31 -o output/reference_index

And, with it, we can align against the reference genome using ska map:

ska map -o output/reference_map.aln --ambig-mask assemblies/GCA_000005845.2.fna.gz output/reference_index.skf

Compared to ska align, ska map keeps all bases in the reference sequence and replaces the sites it cannot find in the indexed assemblies with gaps. We can create the tree again with our script:

python3 create_tree.py output/reference_map.aln

Here we can see that the reference is not closely related to any of the other entries. This makes complete sense, as the reference we took comes from the E. coli K-12 strain, isolated one century ago on the other side of the world (Palo Alto, California, USA)!

Loading...