Subject: | incorrect and invalid behaviour of extract_links() |
Using WWW-Mechanize-0.44, I have found an issue with extract_links().
Currently the behaviour does not follow the documentation. It claims that element [1] of a frame/iframe link will be set to the text enclosed be the tags, when in fact it sets it to the name. Even if this were the desired behaviour, it causes a problem. When the name attribute of the frame/iframe tag is not set, element [1] of the link is returned as undefined. Consequently, when find_link() is called with "text => 'foo'", an error is produced because perl attempts to compare 'foo' with the undefined value.
If using the name is the behaviour you want, then you could check within extract_links() to see if the name attributed is undefined, and return an empty string instead. Then update the documentation.
However, I believe that Mechanize should work as the documentation descibes. So, the correct fix is to always use get_trimmed_text() (or similar). I have attached a patch against WWW-Mechanize-0.44 to do this. This fixes the undefined value problem, but it changes the behaviour of extract_links() slightly. Consider the following example:
<a href="uri1">A</a>
<iframe src="uri2"><a href="uri3"><img alt="B" src="uri4"></a></iframe>
<a href="uri5">C</a>
With Mechanize 0.44 This will produce the following links (url, text):
1, A
2, *undefined*
3, B
4, C
With my patched version, we get:
1, A
2, B
4, C
This is because the process of getting the text within the frame/iframe skips over the tags inside it and so they never get added to the list of links. I suppose you could get the raw HTML/text from inside it and do a recursive call on that to search for links, before doing get_trimmed_text() on it. However, I think my patched version is the correct behaviour. My reasoning for this is that by taking the properties of the frame/iframe and making them visible, we are acting as a user agent that understands frames/iframes. Hence, we should ignore content inside them. If this change is integrated, then I'd suggest perhaps adding a note to the changelog to make people aware of this change in behaviour.
Patch tested using perl version v5.6.1 built for sun4-solaris-thread-multi with patch "ActivePerl Build 631" applied, on a SUNW,Ultra-250 running Solaris 8.
820c820
< my $text = $tag_is_a ? $p->get_trimmed_text("/a") : $token->[1]{name};
---
> my $text = $p->get_trimmed_text("/" . $token->[0]);