Skip Menu |

This queue is for tickets about the XML-TokeParser CPAN distribution.

Report information
The Basics
Id: 36727
Status: new
Priority: 0/
Queue: XML-TokeParser

People
Owner: Nobody in particular
Requestors: pderose [...] cs.wisc.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Partial tokens from get_tag() cause breakage
Date: Fri, 13 Jun 2008 10:46:07 -0500
To: bug-XML-Tokeparser [...] rt.cpan.org
From: Pedro DeRose <pderose [...] cs.wisc.edu>
The various helper methods for accessing a token --- such as tag(), attr(), and the is_* methods --- don't work properly with tokens returned by get_tag. As a simple example: use XML::TokeParser; $p = XML::TokeParser->new(\"<xml></xml>"); $t = $p -> get_tag(); print $t -> tag(); outputs HASH(0x82511b0) For comparison, the same code using get_token() instead of get_tag outputs xml This bug is caused by the documented behavior that a return token from get_tag() "does not include an event type code; its first element is the element name, prefixed by a '/' if the token is for an end tag." Looking at the code, there are a couple other places these partial tokens cause breakage. For instance, the is_start_tag accepts an element name as an argument, and checks if the tag is a start tag of the given element. However, though it accounts for partial tokens when checking the token type, it does not when checking the name. Though I could be wrong, it seems that partial tokens only complicate code with special cases, and don't add much functionality. Given this, I would think the best solution would be to return full tokens from get_tag. Since this was a relatively simple change, I've made it, and am attaching the modified code. Of course, I don't know the code very well, so I may have missed something important.

Message body is not shown because it is too large.

Subject: Re: [rt.cpan.org #36727] AutoReply: Partial tokens from get_tag() cause breakage
Date: Fri, 13 Jun 2008 10:52:38 -0500
To: bug-XML-TokeParser [...] rt.cpan.org
From: Pedro DeRose <pderose [...] cs.wisc.edu>
Actually, I forgot to account for arguments that start with "/" when I fixed get_tag. I've attached the corrected code. Would have been pithier with a triary operator, but since you didn't use it anywhere else in the code, I figured you didn't like them. :) Bugs in XML-TokeParser via RT wrote: Show quoted text
> Greetings, > > This message has been automatically generated in response to the > creation of a trouble ticket regarding: > "Partial tokens from get_tag() cause breakage", > a summary of which appears below. > > There is no need to reply to this message right now. Your ticket has been > assigned an ID of [rt.cpan.org #36727]. Your ticket is accessible > on the web at: > > http://rt.cpan.org/Ticket/Display.html?id=36727 > > Please include the string: > > [rt.cpan.org #36727] > > in the subject line of all future correspondence about this issue. To do so, > you may reply to this message. > > Thank you, > bug-XML-TokeParser@rt.cpan.org > > ------------------------------------------------------------------------- > The various helper methods for accessing a token --- such as tag(), > attr(), and the is_* methods --- don't work properly with tokens > returned by get_tag. As a simple example: > > use XML::TokeParser; > $p = XML::TokeParser->new(\"<xml></xml>"); > $t = $p -> get_tag(); > print $t -> tag(); > > outputs > > HASH(0x82511b0) > > For comparison, the same code using get_token() instead of get_tag outputs > > xml > > > This bug is caused by the documented behavior that a return token from > get_tag() "does not include an event type code; its first element is > the element name, prefixed by a '/' if the token is for an end tag." > > Looking at the code, there are a couple other places these partial > tokens cause breakage. For instance, the is_start_tag accepts an > element name as an argument, and checks if the tag is a start tag of > the given element. However, though it accounts for partial tokens when > checking the token type, it does not when checking the name. > > Though I could be wrong, it seems that partial tokens only complicate > code with special cases, and don't add much functionality. Given this, > I would think the best solution would be to return full tokens from > get_tag. Since this was a relatively simple change, I've made it, and > am attaching the modified code. Of course, I don't know the code very > well, so I may have missed something important. >

Message body is not shown because it is too large.