Wrangle FASTA and FASTQ with SeqKit
Help 3 / 11
Getting Started

Downsample data

If you’re testing a new tool or writing a new algorithm, working with large data files can slow you down because long runtimes make it difficult to iterate quickly.

This is where downsampling (or subsampling) comes in. In the previous step, we saw that hairpins.fa has ~3.1K sequences. Let’s sample 10% of that file and save it as its own FASTA file:

seqkit sample --proportion 0.1 hairpins.fa > sampled.fa

Instead of a fraction, to obtain a number of sequences sampled, use the --number flag:

seqkit sample --number 10 hairpins.fa > sampled.fa
Loading...