Run blastp
Now we need to determine what options we will use for the blastp. In particular, do we want to limit the number of HSPs and target sequences reported for each query? Because we’re mostly interested in determining which proteins match others, we probably only need to keep one hit. But each protein’s best hit will likely be to itself! So we’d better keep the top two with -max_target_seqs 2 and only the best HSP per hit with -max_hsps 1.
For the output, we’ll create a tab-separated output with comment lines (-outfmt 7) called yeast_blastp_yeast_top2.txt:
The coded names—qseqid, sseqid, length, etc.—can be found by running blastp -help.