1. The sequencing reads
The reads have been preloaded to your working directory. We will be analyzing paired-end sequence data, so we have two FASTQ files: reads_R1.fq (representing the “Read 1” reads of each read-pair) and reads_R2.fq (representing the “Read 2” reads of each read-pair).
Try and to take a peek at the “R1” reads file (why did we pick a multiple of 4 in our head command?).
2. The reference genome
In the following steps, you’ll map those reads to the SARS-CoV-2 reference genome.
We preloaded the reference genome’s FASTA file, and its location is stored in the variable $REF_FASTA. Use to see the location.
3. The primers
For amplicon sequence data analysis, you’ll also need a BED file representing the positions of the primers that were used in the amplicon sequencing protocol (we’ll talk about these later in the tutorial), which is stored in the variable $PRIMER_BED. Use to see its location.