Wrangle FASTA and FASTQ with SeqKit
Help 2 / 11
Getting Started

Calculate summary stats

In this tutorial we’ll analyze FASTA sequences of microRNA hairpins from the miRNA database, and FASTQ sequencing reads from 42basepairs.

The data is preloaded here as hairpins.fa and NA12878.fastq:

Let’s use SeqKit to calculate summary statistics for these two files:

seqkit stats hairpins.fa NA12878.fastq

To avoid manually writing each file name, we can use a wildcard (*) to analyze all .fa and .fastq files in the current folder:

seqkit stats *.{fa,fastq}

SeqKit can also calculate additional stats such as GC content, and the fraction of FASTQ reads with a mapping quality of 30. To enable those stats, use the flag --all:

seqkit stats *.{fa,fastq} --all
Loading...