Subject: | High memory usage for very large sequences |
This module appears to have exponential or similar memory usage. I have been using it with
sequences of around 300000 elements with small differences, and I get memory usage of
multiple gigabytes, often too large for 32-bit systems to handle. I suspect this is down to the
algorithm. On the plus side, this definitely performs a lot faster than Algorithm::Diff for
sequences that it can handle.
I am really hoping that I am wrong, as I would have expected this issue to show up before.
The same sequences takes many hours for Algorithm::Diff, so that wasn't an option either.
Since I only needed numeric comparison, I used the diff code from libmba as an alternative.
This is better-licensed equivalent to the algorithm used by GNU diff, and seems to perform
well even for very large sequences. I am happy to extend this into another Algorithm::Diff
variation (I need practice at XS) but I thought it worth reporting as an issue.
The attached (large) file contains a test and a couple of large sequences I used to test this,
compressed using 7z to cut down the file sizes.
Subject: | files.7z |
Message body not shown because it is not plain text.