Capture the flag
As we discussed earlier, the FLAG field in the BAM format encodes several key pieces of information regarding how an alignment aligned to the reference genome. We can exploit this information to isolate specific types of alignments that we want to use in our analysis.
For example, we often want to call variants solely from paired-end sequences that aligned “properly” to the reference genome.
To ask the view command to report solely “proper pairs”, we use the -f option
and ask for alignments where the second bit is true (proper pair is true):
How many properly paired alignments are there? (use the -c option)
Now, let’s ask for alignments that are NOT properly paired. To do this, we use the -F option (note the capitalization to denote “opposite”).
How many total alignments?
Does everything add up?
To get a summary of the flags in our BAM file, we can use samtools flagstats: