Hey,
after a lot of troubleshooting and debugging it seems that I was able to identify the main cause of the problem reported by me in
https://rt.cpan.org/Public/Bug/Display.html?id=105460 .
I dissected my minimal perl script even further and tried to see when exactly the problems occur and if it is somehow reproduceable/understandable.
I made these changes to the lzma_test.pl script, provided (by me) in the orginal bug report. Diff:
- $status = $lz->code ($random_input, $compressed_data);
- $status = $lz->flush ($compressed_data);
+ my $temporary_res = "";
+
+ $status = $lz->code ($random_input, $compressed_data);
+ $temporary_res .= $compressed_data;
+
+ $status = $lz->flush ($compressed_data);
+ $temporary_res .= $compressed_data;
+
+ $compressed_data = $temporary_res;
... and it surprisingly started to work even for larger inputs to the raw encoder.
Then I of course wanted to understand why we need to do so (and why apparently only for large inputs).
I had a further look at the documentation at
http://search.cpan.org/~pmqs/Compress-Raw-Lzma/lib/Compress/Raw/Lzma.pm and of course noticed immediately the "AppendOutput" option (which seemed to be related).
Furthermore, I saw that for Compress::Raw::Lzma::RawEncoder the default should be true/1 (quote: "Defaults to 1.").
I then tried to do a different patch to the original lzma_test.pl script:
- my ($lz, $status) = Compress::Raw::Lzma::RawEncoder->new;
+ my ($lz, $status) = Compress::Raw::Lzma::RawEncoder->new (AppendOutput => 1);
... and it surprisingly started to work (even without the "$temporary_res" changes above), even with very large inputs.
So my conclusion now was that the defaults somehow are not correct (or at least do not reflect what the documentation says). They conflict with each other.
While looking at
http://search.cpan.org/~pmqs/Compress-Raw-Lzma/lib/Compress/Raw/Lzma.pm I came accross some further conflicts (i.e. where the documentation does not reflect what the code does).
The funny thing is that I remember trying all possible options before submitting the bug, but it seems that I didn't test to specify only the "AppendOutput => 1" option.
Instead, as far as I remember, I tried to combine some options (like AppendOutput + ForZip etc) and therefore the output still was not as expected/correct.
Here are some other problems that I noticed while reading
http://search.cpan.org/~pmqs/Compress-Raw-Lzma/lib/Compress/Raw/Lzma.pm :
- There are no default values listed for Compress::Raw::Lzma::StreamDecoder and Compress::Raw::Lzma::RawDecoder (which in my opinion would be important too).
- For Compress::Raw::Lzma::AutoDecoder and Compress::Raw::Lzma::AloneDecoder the documentation says MemLimit "Default is unlimited.", while the code specifies a limit of 128 *1024 *1024
I will attach to this email a patch proposal. It changes the defaults within the code because in my opinion changing the documentation would not really solve the problem.
If instead you think solving the conflict by changing the documentation would be better, I highly suggest that we change the default example.
Most importantly this section:
my ($lz, $status) = new Compress::Raw::Lzma::RawEncoder [OPTS]
or die "Cannot create lzma object: $status\n";
$status = $lz->code($input, $output);
$status = $lz->flush($output);
... because we now know that this is not correct (especially with the RawEncoder), because without forcing to set AppendOutput to 1, it will most likely fail (at least with "larger" inputs).
Furthermore, if we would change only the documentation we must make it very clear that AppendOutput defaults to 0 and we need to add some warnings about not forgetting to save the temporary output to a variable if the whole output is expected to be within the "output".
Since it is very likely that the users will make the mistake to forget to set AppendOutput => 1 or to forget to save the temporary results and "concatenate"/use them, I would suggest that the code fix (see patch) is much more appropriate and safer (I have tested it of course and with the fixes the defaults are correct).
Thanks