Skip to content
Snippets Groups Projects
user avatar
Scott Vokes authored
1. Instead of storing an offset to the previous matching byte in the
   index, store the next matching byte bigram. Since the compression
   doesn't break even unless at least 3 bytes match, any single-byte
   matches in the index should just be skipped.

   Since this eliminates these false positives in one pass, it also
   removes work that would otherwise be done during every walk of the
   index's offset chain - possibly several times. This greatly
   reduces the time spent in find_longest_match().

   EDIT: This prematurely discards information from the index,
   damaging the overall compression effectiveness. Some of the
   links being skipped are still necessary for other chains, so
   they cannot be eliminated this way. It's still possible dynamic
   programming can be used for this, but the strategy in this
   commit doesn't quite work.

2. Eliminate the test of the first byte, since indexing already
   incorporates this comparison ahead of time.
5ad06a4d
History
Name Last commit Last update