Abstract: Counting substrings of an arbitrary length k (k-mers) is the single most time-consuming step of de novo genome sequencing. Sequencing machines generate large quantities of data $(\gt100$ s ...