(BUG) Improve index construction and eliminate redundant compare.
1. Instead of storing an offset to the previous matching byte in the index, store the next matching byte bigram. Since the compression doesn't break even unless at least 3 bytes match, any single-byte matches in the index should just be skipped. Since this eliminates these false positives in one pass, it also removes work that would otherwise be done during every walk of the index's offset chain - possibly several times. This greatly reduces the time spent in find_longest_match(). EDIT: This prematurely discards information from the index, damaging the overall compression effectiveness. Some of the links being skipped are still necessary for other chains, so they cannot be eliminated this way. It's still possible dynamic programming can be used for this, but the strategy in this commit doesn't quite work. 2. Eliminate the test of the first byte, since indexing already incorporates this comparison ahead of time.
Loading
Please register or sign in to comment