Subject: | Weird values from $pages->current_byte() |
I've tried the drop-in replacement for
Parse::MediaWikiDump. In my programm, I print out a kind of progress
status report, where every 5000 processed lines, the percentage of the
total progress so far is displayed:
if (!(++$count % 5000)) {
message("$processed_iso: " . $pages->current_byte() * $percent . "%\n", 1);
}
where percent is
my $percent = 100 / $pages->size();
Normally, I get an output like
rus: 2.92736135448719%
rus: 4.76936761045247%
rus: 6.34968195870317%
rus: 7.39906647470754%
rus: 8.67535165465465%
rus: 9.96669300660353%
...
until somewhere in the 80-90%, but since the replacement, I get (full paste):
rus: 2.92736135448719%
rus: 4.76936761045247%
rus: 6.34968195870317%
rus: 7.39906647470754%
rus: 8.67535165465465%
rus: 9.96669300660353%
rus: 11.5383141558069%
rus: 12.9918106949166%
rus: 14.2415775333654%
rus: 15.5382405643828%
rus: 16.8191182953711%
rus: 17.400673266953%
rus: 18.1541610149901%
rus: 19.0827814735421%
rus: 20.3977862963271%
rus: 21.7221259859246%
rus: 23.1200299353174%
rus: 24.4145362372213%
rus: 25.6426070219686%
rus: 26.9235146474011%
rus: 27.8204560239691%
rus: 29.316505714986%
rus: 30.5565876895689%
rus: 31.1838623230623%
rus: 31.7616345651786%
rus: 32.7580490902927%
rus: 33.8641585187747%
rus: 35.0382072349052%
rus: 36.1364014609105%
rus: 37.4376173592158%
rus: 38.6678651147159%
rus: 39.8824663170689%
rus: 41.1398620499523%
rus: -41.4477783735194%
rus: -40.3617165531779%
rus: -39.1789125675399%
rus: -38.0397321226139%
rus: -36.8653571221788%
rus: -35.5748742263416%
rus: -34.5499344538474%
rus: -34.0245297505949%
rus: -33.1066716039366%
rus: -32.0365527071021%
rus: -31.1068637708056%
rus: -29.9182803681428%
rus: -29.1744408033133%
rus: -28.3387617035213%
rus: -27.1526152148208%
rus: -25.9275296428521%
rus: -24.6540318809368%
rus: -23.4576795304163%
rus: -22.1732273560404%
rus: -20.8804021437284%
rus: -19.5962855359504%
rus: -18.3459499425392%
rus: -17.3420667985523%
rus: -16.3653910113241%
rus: -15.3328930146273%
rus: -14.4566699698825%
rus: -13.5474778969148%
rus: -12.5259328891336%
rus: -11.4827912417906%
rus: -10.5284432697161%
rus: -9.52768496291735%
rus: -8.58304663999153%
rus: -7.57122617982953%
rus: -6.57308409515593%
rus: -5.77433925931634%
rus: -4.75446264622612%
rus: -3.83490587892223%
rus: -2.7434907105121%
rus: -1.80863875215777%
rus: -0.785296392112861%
rus: 0.146929106292996%
rus: 1.11648605719519%
rus: 2.027814520134%
rus: 2.97007417743716%
rus: 3.46145223656532%
rus: 4.34219786414162%
rus: 5.30746231997887%
rus: 6.20728701860753%
rus: 6.9644882299226%
rus: 8.05135000452814%
rus: 9.00982539895428%
rus: 9.90386174927619%
rus: 10.7690126173411%
rus: 11.6271145977892%
rus: 12.5404598144208%
rus: 13.3628079031948%
rus: 14.2008484105644%
rus: 15.0716734898314%
rus: 15.997948219301%
(finished)
Observe the change in sign around 41%. The file in question is the XML
dump of the russian wiki (20100331), which is around 4,8GB in
size. Probably some overflow?