Skip Menu |

This queue is for tickets about the XML-Parser CPAN distribution.

Report information
The Basics
Id: 50781
Status: resolved
Priority: 0/
Queue: XML-Parser

People
Owner: Nobody in particular
Requestors: triddle [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.36
Fixed in: (no value)



Subject: current_byte method returns overflowed data when parsing very large XML files
When parsing a very large XML file, somewhere over 2 gigabytes, the value returned from the current_byte method of XML::Parser::Expat will return negative values. Attached to this ticket is a simple program to parse an XML file and print the current byte value every second. The output when this bug shows its head looks like this: 2134412390 2137250345 2140088951 2142891080 2145707930 -2146463171 -2143671866 -2140868846 -2138058386 -2135266961 -2132475521 This is a particular problem because the Wikipedia dump files are extremely large.
Subject: xml-parser-expat-overflow.pl
#!/usr/bin/env perl use strict; use warnings; use XML::Parser; our $LAST_BYTE = 0; $SIG{ALRM} = \&print_byte; my $xml = XML::Parser->new( Handlers => { Char => \&char } ); alarm(1); $xml->parsefile(shift(@ARGV)); sub char { my ($e) = @_; $LAST_BYTE = $e->current_byte; } sub print_byte { print "$LAST_BYTE\n"; alarm(1); }
I've confirmed this is not an issue when using a 64 bit version of perl.
Ticket migrated to github as https://github.com/toddr/XML-Parser/issues/48