Faster with sorted data
So far the examples presented have used the traditional algorithm in bedtools for finding intersections. It turns out, however, that bedtools is much faster when using presorted data.
For example, compare the difference in speed between the two approaches when finding intersections between exons.bed and hesc.chromHmm.bed:
Note: While the run times in this example are quite small, the performance gains from using the -sorted option grow as datasets grow larger. For example, compare the runtimes of the sorted and unsorted approaches as a function of dataset size in the figure below.
The important thing to remember is that each dataset must be sorted by chromosome and then by start position: sort -k1,1 -k2,2n.
