Bug #5470 for HTML-LinkExtractor: does not detect <IMG> links within <A> .. </A> environment

Fri Feb 27 05:20:00 2004 Guest - Ticket created

Subject:

does not detect <IMG> links within <A> .. </A> environment

Trying to parse <a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a> the LinkExtractor only gives me the url http://www.foo.com, and not http://www.bar.com It does not seem to detect any link which occurs within a _TEXT occurrance I use LinkExtractor 0.09 with perl 5.6.0

Fri Feb 27 06:42:28 2004 PODMASTER [...] cpan.org - Correspondence added 2 min

Subject:

you are mistaken , re: does not detect <IMG> links within <A> .. </A> environment

You are mistaken. Observe: C:\dev\HTML-LinkExtractor\HTML-LinkExtractor-0.09>cat bug.5470.t use HTML::LinkExtractor; my $LX = HTML::LinkExtractor::->new( @ARGV ? ( undef,undef,1) : () ); my $input = q{ <a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a> }; $LX->parse(\ $input ); my $links = $LX->links; use Data::Dumper; local $Data::Dumper::Indent = 1; warn Dumper( $links ),$/; __END__ C:\dev\HTML-LinkExtractor\HTML-LinkExtractor-0.09>perl bug.5470.t $VAR1 = [ { '_TEXT' => '<a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a>', 'href' => 'http://www.foo.com', 'tag' => 'a' }, { 'src' => 'http://www.bar.com/img.gif', 'tag' => 'img' } ]; C:\dev\HTML-LinkExtractor\HTML-LinkExtractor-0.09>perl bug.5470.t 1 $VAR1 = [ { '_TEXT' => '[IMG]', 'href' => 'http://www.foo.com', 'tag' => 'a' }, { 'src' => 'http://www.bar.com/img.gif', 'tag' => 'img' } ];

Fri Feb 27 06:42:29 2004 PODMASTER [...] cpan.org - Ticket deleted

Fri Feb 27 07:33:31 2004 Guest - Correspondence added

From:

wouter [...] teepe.com

Sorry, I have not been detailed enough. it does detect the img tags, but somehow it does not seem to report them to the callback function. observe: ========== #!/usr/bin/perl -w use HTML::LinkExtractor; my $LX1 = HTML::LinkExtractor::->new( ( \&html_link_extractor_cb,undef,1) ); my $LX2 = HTML::LinkExtractor::->new( ( undef,undef,1) ); my $input = q{ <a href="http://www.foo.com"><img src="http://www.bar.com/img.gif"></a> }; print "with callback\n"; $LX1->parse(\ $input ); print "without callback\n"; $LX2->parse(\ $input ); use Data::Dumper; my $links = $LX2->links; local $Data::Dumper::Indent = 1; warn Dumper( $links ),$/; sub html_link_extractor_cb { my( $X, $link ) = @_; local $Data::Dumper::Indent = 1; warn Dumper( $link ),$/; } =========== gives this output: =========== with callback $VAR1 = { '_TEXT' => '[IMG]', 'href' => 'http://www.foo.com', 'tag' => 'a' }; without callback $VAR1 = [ { '_TEXT' => '[IMG]', 'href' => 'http://www.foo.com', 'tag' => 'a' }, { 'src' => 'http://www.bar.com/img.gif', 'tag' => 'img' } ]; ==================

Fri Feb 27 07:33:32 2004 The RT System itself - Status changed from 'dead' to 'open'

Fri Feb 27 17:03:37 2004 PODMASTER [...] cpan.org - Status changed from 'open' to 'resolved'