Subject: | Dictionary size limited |
The current code can only handle a certain size of dictionaries due to memory restrictions. The reason is that if stores the offset for each record, and thus needs at least 4 bytes for each line. Since dictionaries usually have short lines, but many of them, this limits the dictionary size quite a bit.
Can be hopefully fixed by storing only every Nth offset, and loading the N offsets after this on demand by keeping X buffers of N offsets, and use them as cache.