Search FASTA/FASTQ files
Say you’re studying the microRNA miR-34a and you want to extract its sequence in humans, you can use SeqKit’s grep command:
To load sequences in bulk, SeqKit can read sequence IDs from a text file. For example, the file ids.txt contains the IDs of miR-34a across 5 species:
To extract those sequences from hairpins.fa, use the -f flag:
💡 This is a useful tool because the alternative ways to achieve this on the command line would be more challenging. You could try using good-old grep but that only matches the sequence name line. You could use grep -A2 to also retrieve the sequence, but that assumes all sequences have 2 lines, which they do not.