Skip Menu |

This queue is for tickets about the Compress-LZ4 CPAN distribution.

Report information
The Basics
Id: 92825
Status: rejected
Priority: 0/
Queue: Compress-LZ4

People
Owner: Nobody in particular
Requestors: leonerd-cpan [...] leonerd.org.uk
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.18
Fixed in: (no value)



Subject: Segmentation fault
I'm adding CQLv2 support to my Net::Async::CassandraCQL. CQLv2 allows lz4 compression instead of snappy. Unfortunately I seem to get a segfault decompressing the server's response: $ perl -MCompress::LZ4=decompress -E 'say decompress("\0\0\0>\360\16\0\0\0\2\0\0\0\1\0\0\0\3\0\6system\0\5peers\0\4\7\0\360\16\0\20\0\vdata_center\0\r\0\4rack\0\r\0\0\0\0")' Segmentation fault (this string is the bytes over the wire, as seen by strace) -- Paul Evans
On Fri Feb 07 17:25:54 2014, PEVANS wrote: Show quoted text
> I'm adding CQLv2 support to my Net::Async::CassandraCQL. CQLv2 allows > lz4 compression instead of snappy. Unfortunately I seem to get a > segfault decompressing the server's response: > > $ perl -MCompress::LZ4=decompress -E 'say > decompress("\0\0\0>\360\16\0\0\0\2\0\0\0\1\0\0\0\3\0\6system\0\5peers\0\4\7\0\360\16\0\20\0\vdata_center\0\r\0\4rack\0\r\0\0\0\0")' > Segmentation fault > > (this string is the bytes over the wire, as seen by strace)
Have you read this? http://search.cpan.org/dist/Compress-LZ4/lib/Compress/LZ4.pm#COMPATIBILITY
On Fri Feb 07 17:41:58 2014, GRAY wrote: Show quoted text
> Have you read this? > http://search.cpan.org/dist/Compress- > LZ4/lib/Compress/LZ4.pm#COMPATIBILITY
Yes; it turns out that Cassandra already uses that very same framing format: As of this version 2 of the protocol, the following compressions are available: - lz4 (https://code.google.com/p/lz4/). In that, note that the 4 first bytes of the body will be the uncompressed length (followed by the compressed bytes). (from https://github.com/apache/cassandra/blob/cassandra-2.0/doc/native_protocol_v2.spec) -- Paul Evans
On Fri Feb 07 17:41:58 2014, GRAY wrote: Show quoted text
> Have you read this? > http://search.cpan.org/dist/Compress- > LZ4/lib/Compress/LZ4.pm#COMPATIBILITY
Ahah.. A further more careful reading reveals that compress()/decompress() use a 32-bit little-endian size, whereas Cassandra is using big-endian. I'll instead wrap lz4_compress/lz4_decompress with pack/unpack on "N a*". That seems to do it. However, it would be useful if decompress() didn't segfault here, but instead gave a more useful message, such as being unable to allocate memory. The typical perl "Out of memory!" message, perhaps? -- Paul Evans
On Fri Feb 07 17:44:51 2014, PEVANS wrote: Show quoted text
> On Fri Feb 07 17:41:58 2014, GRAY wrote:
> > Have you read this? > > http://search.cpan.org/dist/Compress- > > LZ4/lib/Compress/LZ4.pm#COMPATIBILITY
> > Yes; it turns out that Cassandra already uses that very same framing > format: > > As of this version 2 of the protocol, the following compressions are > available: > - lz4 (https://code.google.com/p/lz4/). In that, note that the 4 > first bytes > of the body will be the uncompressed length (followed by the > compressed > bytes). > > (from https://github.com/apache/cassandra/blob/cassandra- > 2.0/doc/native_protocol_v2.spec)
# big endian perl -E 'say unpack "N", "\0\0\0>\360\16\0\0\0\2\0\0\0\1\0\0\0\3\0\6system\0\5peers\0\4\7\0\360\16\0\20\0\vdata_center\0\r\0\4rack\0\r\0\0\0\0"' 62 # little endian perl -E 'say unpack "V", "\0\0\0>\360\16\0\0\0\2\0\0\0\1\0\0\0\3\0\6system\0\5peers\0\4\7\0\360\16\0\20\0\vdata_center\0\r\0\4rack\0\r\0\0\0\0"' 1040187392 You can see the length in your data is encoded in big-endian, whereas everybody else uses little-endian to encode the length. So not a bug here.
On Fri Feb 07 18:18:38 2014, GRAY wrote: Show quoted text
> So not a bug here.
The fact that a perl module SEGVs is always a bug, especially in this case where it may be applied to untrusted data received from a network socket (a surely common case). A SEGV cannot be trapped by eval {}, in the way that a normal Perl_croak() from XS could be. This potential to SEGV makes this module currently unsuitable for anything Internet-facing. -- Paul Evans
On Fri Feb 07 18:14:47 2014, PEVANS wrote: Show quoted text
> On Fri Feb 07 17:41:58 2014, GRAY wrote:
> > Have you read this? > > http://search.cpan.org/dist/Compress- > > LZ4/lib/Compress/LZ4.pm#COMPATIBILITY
> > Ahah.. A further more careful reading reveals that > compress()/decompress() use a 32-bit little-endian size, whereas > Cassandra is using big-endian. I'll instead wrap > lz4_compress/lz4_decompress with pack/unpack on "N a*". > > That seems to do it. > > However, it would be useful if decompress() didn't segfault here, but > instead gave a more useful message, such as being unable to allocate > memory. The typical perl "Out of memory!" message, perhaps?
I don't think the segfault is because it couldn't allocate memory. I just tested after switching to LZ4_decompress_safe from LZ4_decompress_fast, and it was able to allocate the memory fine, but the decompression function got confused by the garbage data, so used up all my cpu until I eventually killed it.
On Fri Feb 07 18:25:17 2014, PEVANS wrote: Show quoted text
> On Fri Feb 07 18:18:38 2014, GRAY wrote:
> > So not a bug here.
> > The fact that a perl module SEGVs is always a bug, especially in this > case where it may be applied to untrusted data received from a network > socket (a surely common case). > > A SEGV cannot be trapped by eval {}, in the way that a normal > Perl_croak() from XS could be. This potential to SEGV makes this > module currently unsuitable for anything Internet-facing.
Good point, I'll release a new version later tonight using the _safe function instead of the _fast one.