Viral Phylogenetics
Help 2 / 7
Multiple Sequence Alignment

The sequencing reads

For this tutorial, we will be using SARS-CoV-2 whole genome sequences collected from samples from real people! These sequences can be found in the file sarscov2_sequences.fas. Let’s start by taking a peek at this file.

Try to take a look at the SARS-CoV-2 sequences we will be using. To exit the view, you can simply type q for quit.

sarscov2_sequences.fas is in FASTA format, the standardized file format for nucleotide sequence data.

What information do the characters after the > represent? Take a look at the FASTA format documentation!

Next, let’s figure out how many sequences are in sarscov2_sequences.fas. To do so, we can use the grep "<pattern>" <file> command, which will enable us to search for a particular text pattern in a file. For example, grep "abc" test.txt will return all lines containing the string "abc". Use the grep command to determine how many sequences are in sarscov2_sequences.fas.

Hint: add "^" to the beginning of "pattern" to limit the search to lines that begin with "pattern". The wc -l command may also be helpful!

How many sequences are in sarscov2_sequences.fas?

If you’re up for an extra challenge, try to determine the length of each sequence in sarscov2_sequences.fas!

Loading...