Skip to content
Snippets Groups Projects
  • Scott Vokes's avatar
    5ad06a4d
    (BUG) Improve index construction and eliminate redundant compare. · 5ad06a4d
    Scott Vokes authored
    1. Instead of storing an offset to the previous matching byte in the
       index, store the next matching byte bigram. Since the compression
       doesn't break even unless at least 3 bytes match, any single-byte
       matches in the index should just be skipped.
    
       Since this eliminates these false positives in one pass, it also
       removes work that would otherwise be done during every walk of the
       index's offset chain - possibly several times. This greatly
       reduces the time spent in find_longest_match().
    
       EDIT: This prematurely discards information from the index,
       damaging the overall compression effectiveness. Some of the
       links being skipped are still necessary for other chains, so
       they cannot be eliminated this way. It's still possible dynamic
       programming can be used for this, but the strategy in this
       commit doesn't quite work.
    
    2. Eliminate the test of the first byte, since indexing already
       incorporates this comparison ahead of time.
    5ad06a4d
    History
    (BUG) Improve index construction and eliminate redundant compare.
    Scott Vokes authored
    1. Instead of storing an offset to the previous matching byte in the
       index, store the next matching byte bigram. Since the compression
       doesn't break even unless at least 3 bytes match, any single-byte
       matches in the index should just be skipped.
    
       Since this eliminates these false positives in one pass, it also
       removes work that would otherwise be done during every walk of the
       index's offset chain - possibly several times. This greatly
       reduces the time spent in find_longest_match().
    
       EDIT: This prematurely discards information from the index,
       damaging the overall compression effectiveness. Some of the
       links being skipped are still necessary for other chains, so
       they cannot be eliminated this way. It's still possible dynamic
       programming can be used for this, but the strategy in this
       commit doesn't quite work.
    
    2. Eliminate the test of the first byte, since indexing already
       incorporates this comparison ahead of time.