Subject: | PullParser gives different answer for same input as file and scalar ref |
Hello,
There appears to be a bug in HTML::PullParser which causes the wrong results to be returned sometimes with a file is used as a scalar ref, even when it gets it right when given the same data as a file. Here's a test to illustrate it.
The file in question is here:
http://mark.stosberg.com/dfv/wm.html
Below is a test script I was playing with. When given the data as a file, it correctly finds 4 images on the page. When given the same data as a scalar ref, only the first image is found.
I poked around the code a bit to see if I could patch it.
I was wondering why the difference between a file and a scalar ref was not isolated to "new". Why not convert the file into a string there and treat them the same for the rest of the time? Perhaps it would be inefficient for expecially large files. OTOH, maybe it would be possible to tie the scalar ref to an in memory file handle. Either solution would prevent this kind of bug that appears with one kind of input but not the other.
Thanks for this module. I find the interface more comprehensible then the standard HTML::Parser interface. :)
Mark
####
#!/usr/bin/perl
use strict;
use WWW::Mechanize;
use Test::More qw/no_plan/;
my $wm = WWW::Mechanize->new();
$wm->get('http://mark.stosberg.com/dfv');
ok(find_broken_images($wm), 'checking for broken images');
sub find_broken_images {
my $self = shift;
require HTML::TokeParser;
#my $p = HTML::TokeParser->new(\$self->{content});
my $p = HTML::TokeParser->new('/home/mark/www/dfv/wm.html');
my $ok = 1;
while (my $token = $p->get_tag("img")) {
my $url = $token->[1]{src};
print "$url\n";
my $r = $self->get($url);
unless ($r->is_success) {
warn "img is broken: $url\n";
$ok = 0;
}
}
return $ok;
}