Bug #99047 for Compress-LZ4: 60380

Mon Sep 22 08:01:54 2014 rch [...] skynet.be - Ticket created

Subject:	60380_Bug report
Date:	Mon, 22 Sep 2014 14:01:05 +0200
To:	bug-Compress-LZ4 [...] rt.cpan.org
From:	rch <rch [...] skynet.be>

=============================================== Bruxelles 2014_09_22 To bug-Compress-LZ4@rt.cpan.org. My mozilla firefox bookmarks have suddenly switched to format .jsonlz4 format. So I am trying to get info out of them using Compress::LZ4. But for even a small (142 kiB) file I get message <<Out of memory!>> Further details and demo code below the line. What am I doing wrong? Thank you in advance for your help. Richard H =============================================== email rch@skynet.be =============================================== Distribution name and version: Compress::LZ4 $VERSION '0.20' Perl version: perl 5, version 14, subversion 2 (v5.14.2) built for i686-linux-gnu-thread-multi-64int Operating System vendor and version: Linux Ubuntu 3.2.0-68-generic-pae #102-Ubuntu SMP Tue Aug 12 22:23:54 UTC 2014 i686 i686 i386 GNU/Linux My operating environment: Memory 985MiB, of which 645MiB are in use I've read https://rt.cpan.org/Public/Bug/Display.html?id=92825 but its all above my head. I'm only a poor botanist ... Demo code: use strict; use Carp; use File::Slurp; use Compress::LZ4; # 1. Slurp the encoded bookmark *.jsonlz4 file my $infile = "/home/rch/.mozilla/firefox/1oj9bxa6.default/bookmarkbackups/bookmarks-2014-09-22_751_CmFFTpPHaKIDCdLOKrLuHQ==.jsonlz4"; my $lz4_encoded_text = read_file( $infile, binmode => ':raw' ); unless( $lz4_encoded_text ){ croak "Ohhh Problem stage 1 read_file failed\n\tcroaked"; } # 2. Uncompress the *.jsonlz4 bytes my $utf8_encoded_json_text = lz4_uncompress($lz4_encoded_text); unless($utf8_encoded_json_text){ croak "ohh decompress failed\n\tCroaked"; } # Never get to here; just get the message # Out of memory! Show quoted text

-------- End of Original Message -------- -- =============================================== 82 AVENUE WALCKIERS B-1160 AUDERGHEM BELGIQUE tel +32 (0)2 660 52 23 email rch@skynet.be ===============================================

Mon Sep 22 11:04:32 2014 gray [...] cpan.org - Correspondence added

On Mon Sep 22 08:01:54 2014, rch@skynet.be wrote: Show quoted text

> =============================================== > Bruxelles > 2014_09_22 > > To bug-Compress-LZ4@rt.cpan.org. > My mozilla firefox bookmarks have suddenly switched to > format .jsonlz4 format. So I am trying to get info > out of them using Compress::LZ4. > > But for even a small (142 kiB) file I get message > <<Out of memory!>> > > Further details and demo code below the line. > > What am I doing wrong? > > Thank you in advance for your help. > Richard H > =============================================== > email rch@skynet.be > =============================================== > > Distribution name and version: > Compress::LZ4 > $VERSION '0.20' > > Perl version: > perl 5, version 14, subversion 2 (v5.14.2) > built for i686-linux-gnu-thread-multi-64int > > Operating System vendor and version: > Linux Ubuntu 3.2.0-68-generic-pae > #102-Ubuntu SMP Tue Aug 12 22:23:54 UTC 2014 i686 i686 i386 > GNU/Linux > > My operating environment: > Memory 985MiB, of which 645MiB are in use > I've read https://rt.cpan.org/Public/Bug/Display.html?id=92825 > but its all above my head. > I'm only a poor botanist ... > > Demo code: > use strict; > use Carp; > use File::Slurp; > use Compress::LZ4; > > # 1. Slurp the encoded bookmark *.jsonlz4 file > my $infile = > "/home/rch/.mozilla/firefox/1oj9bxa6.default/bookmarkbackups/bookmarks- > 2014-09-22_751_CmFFTpPHaKIDCdLOKrLuHQ==.jsonlz4"; > my $lz4_encoded_text = read_file( $infile, binmode => > ':raw' ); > unless( $lz4_encoded_text ){ > croak "Ohhh Problem stage 1 read_file failed\n\tcroaked"; > } > > # 2. Uncompress the *.jsonlz4 bytes > my $utf8_encoded_json_text = lz4_uncompress($lz4_encoded_text); > unless($utf8_encoded_json_text){ > croak "ohh decompress failed\n\tCroaked"; > } > # Never get to here; just get the message > # Out of memory! > -------- End of Original Message --------

This isn't an issue with the module. Mozilla is using their own file format with a proprietary header. When Compress::LZ4 reads their header, it gets interpreted as the size of the compressed data and that's why you're running out of memory. The format isn't documented anywhere but if you look at the file contents you can see it begins with "mozlz40\0". If you remove those first 8 bytes from the data before decompressing, it will work. Also, you should be using the uncompress function, not the lz4_uncompress function. You can see the difference between the two in the documentation. It's only an implementation detail which allows the lz4_uncompress to act like the uncompress function when the size argument is missing. I consider that a bug and will correct that in a future version.

Mon Sep 22 11:04:33 2014 The RT System itself - Status changed from 'new' to 'open'

Mon Sep 22 11:04:33 2014 gray [...] cpan.org - Status changed from 'open' to 'rejected'

Mon Sep 22 11:17:38 2014 rch [...] skynet.be - Correspondence added

Subject:	Re: [rt.cpan.org #99047] 60380_Bug report
Date:	Mon, 22 Sep 2014 17:17:22 +0200
To:	bug-Compress-LZ4 [...] rt.cpan.org
From:	rch <rch [...] skynet.be>

On 09/22/2014 05:04 PM, gray via RT wrote: ...snip.. Show quoted text

> > This isn't an issue with the module. Mozilla is using their own file format with a proprietary header. When Compress::LZ4 reads their header, it gets interpreted as the size of the compressed data and that's why you're running out of memory. The format isn't documented anywhere but if you look at the file contents you can see it begins with "mozlz40\0". If you remove those first 8 bytes from the data before decompressing, it will work. > > Also, you should be using the uncompress function, not the lz4_uncompress function. You can see the difference between the two in the documentation. It's only an implementation detail which allows the lz4_uncompress to act like the uncompress function when the size argument is missing. I consider that a bug and will correct that in a future version. >

Many thanks for your quick and most helpful reply. With kind regards Richard H

Tue Sep 23 07:44:54 2014 rch [...] skynet.be - Correspondence added

Subject:	Suggested patch Re: [rt.cpan.org #99047] 60380_Bug report
Date:	Tue, 23 Sep 2014 13:44:35 +0200
To:	bug-Compress-LZ4 [...] rt.cpan.org
From:	rch <rch [...] skynet.be>

In order to improve on the rather confusing message Out of Memory! when Compres-LZ4 receives a file that is not pure LZ4, would it be possible to do some checks on the incoming file - something like the script below the line Kind regards Richard H --------------------------------------------------- # this is my lz4_verifier.pl use strict; use Carp; use File::Slurp; my $infile = 'bookmarks-2014-09-22etc_real.lz4';#'bookmarks-2014-09-22etc.jsonlz4'; my $result = check_format( $infile); print STDOUT "$infile\n\t$result"; #----------------------------------------------------- sub check_format #----------------------------------------------------- { my $infile = shift; my $message; if(!-f $infile){ $message = "Could not find \$infile <$infile>"; } else{ my $encoded_text = read_file( $infile, binmode => ':raw' ); if( !$encoded_text ){ $message = "Problem $!. Failed to read file"; } else{ my (@seen,$rest); (@seen[0..7], $rest) = unpack( "H2,H2,H2,H2,H2,H2,H2,H2,A*", $encoded_text); my @lz4 = qw(6e 85 05 00 b0 7b 22 69); my @jsonlz4 = qw(6d 6f 7a 4c 7a 34 30 00); my( $purelz4, $mozilla, $bad, ); foreach my $i (0 .. 7 ){ if( $seen[$i] eq $jsonlz4[$i] ){++$mozilla;} elsif( $seen[$i] eq $lz4[$i] ){++$purelz4;} else{++$bad;} } if( $purelz4==8){$message = "OK!";} elsif($mozilla==8){$message = "This looks like a Mozilla Firefox backup.\n\tPlease remove the first 8 bytes of the file and try again";} elsif($bad ==8){$message = "I dont think this is an lz4 file";} else{$message = "Confusion! Test of file header says $purelz4 bytes are lz4, $mozilla bytes are jsonlz4, and $bad bytes are neither one nor the other";} } } return($message); }##sub check_format #/message ends

Tue Sep 23 08:58:10 2014 gray [...] cpan.org - Correspondence added

On Tue Sep 23 07:44:54 2014, rch@skynet.be wrote: Show quoted text

> In order to improve on the rather confusing message > Out of Memory! > when Compres-LZ4 receives a file that is not pure LZ4, > would it be possible to do some checks on > the incoming file - something like the > script below the line > > Kind regards > Richard H > --------------------------------------------------- > # this is my lz4_verifier.pl > use strict; > use Carp; > use File::Slurp; > > my $infile = > 'bookmarks-2014-09-22etc_real.lz4';#'bookmarks-2014-09-22etc.jsonlz4'; > > my $result = check_format( $infile); > print STDOUT "$infile\n\t$result"; > > #----------------------------------------------------- > sub check_format > #----------------------------------------------------- > { > my $infile = shift; > my $message; > if(!-f $infile){ > $message = "Could not find \$infile <$infile>"; > } > else{ > my $encoded_text = read_file( $infile, binmode => ':raw' ); > if( !$encoded_text ){ > $message = "Problem $!. Failed to read file"; > } > else{ > my (@seen,$rest); > (@seen[0..7], $rest) = unpack( "H2,H2,H2,H2,H2,H2,H2,H2,A*", > $encoded_text); > > my @lz4 = qw(6e 85 05 00 b0 7b 22 69); > my @jsonlz4 = qw(6d 6f 7a 4c 7a 34 30 00); > > my( $purelz4, $mozilla, $bad, ); > foreach my $i (0 .. 7 ){ > if( $seen[$i] eq $jsonlz4[$i] ){++$mozilla;} > elsif( $seen[$i] eq $lz4[$i] ){++$purelz4;} > else{++$bad;} > } > if( $purelz4==8){$message = "OK!";} > elsif($mozilla==8){$message = "This looks like a Mozilla > Firefox backup.\n\tPlease remove the first 8 bytes of the file and try > again";} > elsif($bad ==8){$message = "I dont think this is an lz4 > file";} > else{$message = "Confusion! Test of file header says > $purelz4 bytes are lz4, $mozilla bytes are jsonlz4, and $bad bytes are > neither one nor the other";} > } > } > return($message); > }##sub check_format > #/message ends

You're making an assumption based only upon your data that will not apply to the general case. Read the COMPATIBILITY section of the documentation. The official LZ4 project is working on a container format, but I don't know what the status of that is

Wed Sep 24 00:43:04 2014 rch [...] skynet.be - Correspondence added

Subject:	Re: [rt.cpan.org #99047] 60380_Bug report
Date:	Wed, 24 Sep 2014 06:42:44 +0200
To:	bug-Compress-LZ4 [...] rt.cpan.org
From:	rch <rch [...] skynet.be>

Perhaps an alternative solution could be to enhance the error message so that it reads "Either out of Memory Or else some other problem with the file header" Richard H On 09/23/2014 02:58 PM, gray via RT wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=99047 > > > On Tue Sep 23 07:44:54 2014, rch@skynet.be wrote:

>> In order to improve on the rather confusing message >> Out of Memory! >> when Compres-LZ4 receives a file that is not pure LZ4,

...snip Show quoted text

> > You're making an assumption based only upon your data that will not apply to the general case. Read the COMPATIBILITY section of the documentation. The official LZ4 project is working on a container format, but I don't know what the status of that is > > >

Wed Sep 24 00:52:11 2014 gray [...] cpan.org - Correspondence added

On Wed Sep 24 00:43:04 2014, rch@skynet.be wrote: Show quoted text

> Perhaps an alternative solution could be > to enhance the error message > so that it reads > "Either out of Memory > Or else some other problem > with the file header"

The out of memory error is coming from Perl itself. I consider this a GIGO problem, so nothing to fix on this end. Eventually the new LZ4 container format will resolve this issue anyhow. Show quoted text

> Richard H > On 09/23/2014 02:58 PM, gray via RT wrote:

> > <URL: https://rt.cpan.org/Ticket/Display.html?id=99047 > > > > > On Tue Sep 23 07:44:54 2014, rch@skynet.be wrote:

> >> In order to improve on the rather confusing message > >> Out of Memory! > >> when Compres-LZ4 receives a file that is not pure LZ4,

> > ...snip >

> > > > You're making an assumption based only upon your data that will not > > apply to the general case. Read the COMPATIBILITY section of the > > documentation. The official LZ4 project is working on a container > > format, but I don't know what the status of that is > > > > > >

Bug #99047 for Compress-LZ4: 60380_Bug report