Wrangle FASTA and FASTQ with SeqKit
Help 4 / 11
Getting Started

Extract and filter

The seqkit seq command is used to extract, filter and format your FASTA and FASTQ files.

For example, to extract the sequence names from a FASTA file:

seqkit seq --name hairpins.fa | head

If your FASTA is formatted such that the sequence name contains an ID followed by a space and more information, then you can extract just those IDs using --only-id:

seqkit seq --name --only-id hairpins.fa | head

If you are interested in only sequences of a certain size, e.g. >300bp, use the --min-len to filter out shorter sequences:

seqkit seq --min-len 300 hairpins.fa | seqkit stats

You can also filter out long sequences with --max-len, and for FASTQ files, you can filter out reads with a certain average quality with --min-qual and --max-qual.

Loading...