Wrangle FASTA and FASTQ with SeqKit

Help 5 / 11

Getting Started

Search FASTA/FASTQ files

Say you’re studying the microRNA miR-34a and you want to extract its sequence in humans, you can use SeqKit’s grep command:

seqkit grep --pattern 'hsa-mir-34a' hairpins.fa

To load sequences in bulk, SeqKit can read sequence IDs from a text file. For example, the file ids.txt contains the IDs of miR-34a across 5 species:

cat ids.txt

To extract those sequences from hairpins.fa, use the -f flag:

seqkit grep -f ids.txt hairpins.fa

💡 This is a useful tool because the alternative ways to achieve this on the command line would be more challenging. You could try using good-old grep but that only matches the sequence name line. You could use grep -A2 to also retrieve the sequence, but that assumes all sequences have 2 lines, which they do not.

← Previous Next →

Memory usage

Import

Export

Reset your sandbox

Help