To run the split k-mer alignment, we first need to build an index of the split k-mers from the assemblies. This is accomplished with the ska build command which requires a tab-separated list consisting of the sequence name and assembly file path as input:
ID1 path/to/fasta_file_1.fasta.gz
ID2 path/to/fasta_file_2.fasta.gz
ID3 path/to/fasta_file_3.fasta.gz
.
.
.which is provided in assemblies/ska_input.tsv. Let’s first create an output folder with
Now, the split k-mers index is built from the input list by executing
which will take a bit of time to run. In a normal computer this would be very quick, but as we are running from a browser it will last 3-4 minutes or so.
In the example we chose the value of k as 31 which is a commonly used choice for analysing bacterial genomes. For a more detailed analysis of the different possible values of k, have a look at the paper by Bussi, Kapon, and Reich in PLOS ONE.