Skip Menu |

This queue is for tickets about the IO-Compress CPAN distribution.

Report information
The Basics
Id: 103295
Status: resolved
Priority: 0/
Queue: IO-Compress

People
Owner: Nobody in particular
Requestors: jessenbredeson [...] icloud.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: IO::Compress Feature request
Date: Fri, 03 Apr 2015 12:44:52 -0700
To: bug-IO-Compress [...] rt.cpan.org
From: Jessen Bredeson <jessenbredeson [...] icloud.com>
Good day developer! While trying to use IO::Uncompress::Gunzip to read bgzip’d files (https://github.com/samtools/htslib.git) my scripts fail in a way that would suggest the block decompressed is not being handled correctly (partial lines are written). This tends to happen on large files. Performing a few simple tests, it looks as though zcat and gunzip are able to correctly uncompress bgzip'd files without issue. Could we get bgzip support in IO::Compress::Gzip and IO::Uncompress::Gunzip? Thank you so much! Jessen Bredeson jessenbredeson@gmail.com
Hey Jessen, I'd like to help but I'm not 100% clear what the problem is. Could you share some code with me that triggers the issue along with a sample bzip2 file? Paul
Anything more on this issue Jessen? On Fri Apr 03 18:48:32 2015, PMQS wrote: Show quoted text
> Hey Jessen, > > I'd like to help but I'm not 100% clear what the problem is. > > Could you share some code with me that triggers the issue along with a > sample bzip2 file? > > Paul
Subject: Re: [rt.cpan.org #103295] IO::Compress Feature request
Date: Mon, 04 May 2015 12:08:20 -0700
To: bug-IO-Compress [...] rt.cpan.org
From: Jessen Bredeson <jessenbredeson [...] icloud.com>
Hey Paul, I’d like to clarify. The issue is not with reading bzip2 files, but with reading bgzip (block-compressed gzip) files. Attached are a test script and a test dataset (bgzip-compressed) that demonstrates my issue. For more info on bgzip, please see: http://manpages.ubuntu.com/manpages/raring/man1/bgzip.1.html for the man page, and http://sourceforge.net/projects/samtools/files/tabix/ for the source code tarball. Jessen Bredeson jessenbredeson@gmail.com

Message body is not shown because sender requested not to inline it.

Download test.vcf.gz
application/x-gzip 11.7m

Message body not shown because it is not plain text.

Show quoted text
> On May 4, 2015, at 02:20, Paul Marquess via RT <bug-IO-Compress@rt.cpan.org> wrote: > > <URL: https://rt.cpan.org/Ticket/Display.html?id=103295 > > > Anything more on this issue Jessen? > > On Fri Apr 03 18:48:32 2015, PMQS wrote:
>> Hey Jessen, >> >> I'd like to help but I'm not 100% clear what the problem is. >> >> Could you share some code with me that triggers the issue along with a >> sample bzip2 file? >> >> Paul
> > >
Sending the previous mail has failed. Please contact your admin, they can find more details in the logs.
Subject: Re: [rt.cpan.org #103295] IO::Compress Feature request
Date: Mon, 04 May 2015 12:17:08 -0700
To: bug-IO-Compress [...] rt.cpan.org
From: Jessen Bredeson <jessenbredeson [...] icloud.com>
Hey Paul, I’d like to clarify. The issue is not with reading bzip2 files, but with reading bgzip (block-compressed gzip) files. Attached are a test script and a test dataset (bgzip-compressed) that demonstrates my issue. For more info on bgzip, please see: http://manpages.ubuntu.com/manpages/raring/man1/bgzip.1.html for the man page, and http://sourceforge.net/projects/samtools/files/tabix/ for the source code tarball. Jessen Bredeson jessenbredeson@gmail.com

Message body is not shown because sender requested not to inline it.

Download test.vcf.gz
application/x-gzip 40.8k

Message body not shown because it is not plain text.

Show quoted text
> On May 4, 2015, at 02:20, Paul Marquess via RT <bug-IO-Compress@rt.cpan.org> wrote: > > <URL: https://rt.cpan.org/Ticket/Display.html?id=103295 > > > Anything more on this issue Jessen? > > On Fri Apr 03 18:48:32 2015, PMQS wrote:
>> Hey Jessen, >> >> I'd like to help but I'm not 100% clear what the problem is. >> >> Could you share some code with me that triggers the issue along with a >> sample bzip2 file? >> >> Paul
> > >
Aaaah sorry, I completely misread your original posting. The good news is a simple modification to your script will fix your issue. Change the constructor line from this my $io = IO::Uncompress::Gunzip->new(shift(@ARGV)); to this my $io = IO::Uncompress::Gunzip->new(shift(@ARGV), MultiStream => 1); A bgzip file consists of a series of valid gzip files concatenated together. Unlike the command line gunzip program, the default action for my module is to stop uncompressing once it hits the end of the first gzip data stream. The MultiStream option is designed exactly for this use-case where there are multiple concatenated data streams. It makes it keep on going until it hits the end of the file (or it encounters data that isn't a valid gzip file). Ping me if you have any problems. cheers Paul
Subject: Re: [rt.cpan.org #103295] IO::Compress Feature request
Date: Mon, 04 May 2015 13:29:05 -0700
To: "bug-IO-Compress [...] rt.cpan.org" <bug-IO-Compress [...] rt.cpan.org>
From: Jessen Bredeson <jessenbredeson [...] icloud.com>
Thanks Paul, that's immensely helpful! Jessen Bredeson (Sent from my mobile device) Show quoted text
> On May 4, 2015, at 13:27, Paul Marquess via RT <bug-IO-Compress@rt.cpan.org> wrote: > > <URL: https://rt.cpan.org/Ticket/Display.html?id=103295 > > > Aaaah sorry, I completely misread your original posting. > > The good news is a simple modification to your script will fix your issue. Change the constructor line from this > > my $io = IO::Uncompress::Gunzip->new(shift(@ARGV)); > > to this > > my $io = IO::Uncompress::Gunzip->new(shift(@ARGV), MultiStream => 1); > > A bgzip file consists of a series of valid gzip files concatenated together. Unlike the command line gunzip program, the default action for my module is to stop uncompressing once it hits the end of the first gzip data stream. The MultiStream option is designed exactly for this use-case where there are multiple concatenated data streams. It makes it keep on going until it hits the end of the file (or it encounters data that isn't a valid gzip file). > > Ping me if you have any problems. > > cheers > Paul