Subject: | byteConsumed() method of LibXML Reader wraps around 2 gigs of input XML |
I got bug #56843 opened for MediaWiki::DumpFile which I've traced back to the LibXML
reader used in the module. Specifically the value from byteConsumed() wraps around 2
gigabytes of input XML. Here is an example program:
#!/usr/bin/env perl
use strict;
use warnings;
use XML::LibXML::Reader;
my $reader = XML::LibXML::Reader->new(location => shift(@ARGV));
while(1) {
if ($reader->byteConsumed < 0) {
die "wrapped to " . $reader->byteConsumed;
}
last unless $reader->nextElement('page') == 1;
print $reader->byteConsumed, "\n";
}
which will output:
2147463683
2147472892
2147473405
-2147478169
wrapped to -2147478169 at ./test.pl line 13.
foodmotron:00-Playing tyler$
I ran across this issue with Parse::MediaWikiDump which was using XML::Parser - it wrapped
in the same place but only on a 32 bit perl; using a 64 bit perl was a valid workaround in
that instance. In this instance I'm using a 64 bit Perl but the wrap still happens.
Thanks for the great software! LibXML is fantastic. :-)
Cheers,
Tyler