Skip Menu |

This queue is for tickets about the Compress-Bzip2 CPAN distribution.

Report information
The Basics
Id: 126269
Status: new
Priority: 0/
Queue: Compress-Bzip2

People
Owner: Nobody in particular
Requestors: standley [...] biken.osaka-u.ac.jp
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: truncated lined in bzreadline?
Date: Tue, 14 Aug 2018 17:08:24 +0900
To: bug-Compress-Bzip2 [...] rt.cpan.org
From: Daron Standley <standley [...] biken.osaka-u.ac.jp>
Hi, I have been playing around with perl for a few hours and I am very impressed with the speed of reading a huge bz2 compressed file Just to give some numbers Time required to read a space-delimited bz2 file with 1000 lines of length 557780 characters (78890 integers (0-9) separated by white spaces). python pd.read_csv(file, compression='bz2', header=0): 14 min python subprocess('bunzip2 -c ' + file): 7 min perl open('bunzip2 -c $file |'): 66 sec!! So, I next started trying to use the Bzip2 module. However, I noticed the bzreadline function was returning only 4096 characters for the files. So, for example I get the following when using bunzip2 : my $cmd="bunzip2 -c $fbz2 |"; open(FBZ,$cmd); while(<FBZ>){ my @line = split(/\s+/); printf("len %d\n",scalar(@line)); } close(FBZ); len 278890 len 278890 . . . But when I use bzreadline as follows: my $bz = bzopen($fbz, "rb") or die "Cannot open $fbz: $bzerrno\n" ; while ($bz->bzreadline($_) > 0 ) { my @line = split(/\s+/); printf("len %d\n",scalar(@line)); } $bz->bzclose() ; I get len 2048 len 2048 . . I am guessing there is a buffer I can set somewhere, but I couldn't figure this out by myself. if you have any clues I would be grateful. Thanks a lot DMS