Subject: | Partial tokens from get_tag() cause breakage |
Date: | Fri, 13 Jun 2008 10:46:07 -0500 |
To: | bug-XML-Tokeparser [...] rt.cpan.org |
From: | Pedro DeRose <pderose [...] cs.wisc.edu> |
The various helper methods for accessing a token --- such as tag(),
attr(), and the is_* methods --- don't work properly with tokens
returned by get_tag. As a simple example:
use XML::TokeParser;
$p = XML::TokeParser->new(\"<xml></xml>");
$t = $p -> get_tag();
print $t -> tag();
outputs
HASH(0x82511b0)
For comparison, the same code using get_token() instead of get_tag outputs
xml
This bug is caused by the documented behavior that a return token from
get_tag() "does not include an event type code; its first element is
the element name, prefixed by a '/' if the token is for an end tag."
Looking at the code, there are a couple other places these partial
tokens cause breakage. For instance, the is_start_tag accepts an
element name as an argument, and checks if the tag is a start tag of
the given element. However, though it accounts for partial tokens when
checking the token type, it does not when checking the name.
Though I could be wrong, it seems that partial tokens only complicate
code with special cases, and don't add much functionality. Given this,
I would think the best solution would be to return full tokens from
get_tag. Since this was a relatively simple change, I've made it, and
am attaching the modified code. Of course, I don't know the code very
well, so I may have missed something important.
Message body is not shown because it is too large.