Building trees with SKA
Help 3 / 7
Indexing the assemblies

To run the split k-mer alignment, we first need to build an index of the split k-mers from the assemblies. This is accomplished with the ska build command which requires a tab-separated list consisting of the sequence name and assembly file path as input:

ID1	path/to/fasta_file_1.fasta.gz
ID2	path/to/fasta_file_2.fasta.gz
ID3	path/to/fasta_file_3.fasta.gz
.
.
.

which is provided in assemblies/ska_input.tsv. Let’s first create an output folder with

mkdir output

Now, the split k-mers index is built from the input list by executing

ska build -f assemblies/ska_input.tsv -k 31 -o output/ska_index

which will take a bit of time to run. In a normal computer this would be very quick, but as we are running from a browser it will last 3-4 minutes or so.

In the example we chose the value of k as 31 which is a commonly used choice for analysing bacterial genomes. For a more detailed analysis of the different possible values of k, have a look at the paper by Bussi, Kapon, and Reich in PLOS ONE.

Loading...