Skip Menu |

This queue is for tickets about the HTML-LinkExtractor CPAN distribution.

Report information
The Basics
Id: 5470
Status: resolved
Worked: 2 min
Priority: 0/
Queue: HTML-LinkExtractor

People
Owner: Nobody in particular
Requestors: wouter [...] teepe.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.09
Fixed in: (no value)



Subject: does not detect <IMG> links within <A> .. </A> environment
Trying to parse <a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a> the LinkExtractor only gives me the url http://www.foo.com, and not http://www.bar.com It does not seem to detect any link which occurs within a _TEXT occurrance I use LinkExtractor 0.09 with perl 5.6.0
Subject: you are mistaken , re: does not detect <IMG> links within <A> .. </A> environment
You are mistaken. Observe: C:\dev\HTML-LinkExtractor\HTML-LinkExtractor-0.09>cat bug.5470.t use HTML::LinkExtractor; my $LX = HTML::LinkExtractor::->new( @ARGV ? ( undef,undef,1) : () ); my $input = q{ <a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a> }; $LX->parse(\ $input ); my $links = $LX->links; use Data::Dumper; local $Data::Dumper::Indent = 1; warn Dumper( $links ),$/; __END__ C:\dev\HTML-LinkExtractor\HTML-LinkExtractor-0.09>perl bug.5470.t $VAR1 = [ { '_TEXT' => '<a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a>', 'href' => 'http://www.foo.com', 'tag' => 'a' }, { 'src' => 'http://www.bar.com/img.gif', 'tag' => 'img' } ]; C:\dev\HTML-LinkExtractor\HTML-LinkExtractor-0.09>perl bug.5470.t 1 $VAR1 = [ { '_TEXT' => '[IMG]', 'href' => 'http://www.foo.com', 'tag' => 'a' }, { 'src' => 'http://www.bar.com/img.gif', 'tag' => 'img' } ];
From: wouter [...] teepe.com
Sorry, I have not been detailed enough. it does detect the img tags, but somehow it does not seem to report them to the callback function. observe: ========== #!/usr/bin/perl -w use HTML::LinkExtractor; my $LX1 = HTML::LinkExtractor::->new( ( \&html_link_extractor_cb,undef,1) ); my $LX2 = HTML::LinkExtractor::->new( ( undef,undef,1) ); my $input = q{ <a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a> }; print "with callback\n"; $LX1->parse(\ $input ); print "without callback\n"; $LX2->parse(\ $input ); use Data::Dumper; my $links = $LX2->links; local $Data::Dumper::Indent = 1; warn Dumper( $links ),$/; sub html_link_extractor_cb { my( $X, $link ) = @_; local $Data::Dumper::Indent = 1; warn Dumper( $link ),$/; } =========== gives this output: =========== with callback $VAR1 = { '_TEXT' => '[IMG]', 'href' => 'http://www.foo.com', 'tag' => 'a' }; without callback $VAR1 = [ { '_TEXT' => '[IMG]', 'href' => 'http://www.foo.com', 'tag' => 'a' }, { 'src' => 'http://www.bar.com/img.gif', 'tag' => 'img' } ]; ==================