Skip Menu |

This queue is for tickets about the HTML-Tree CPAN distribution.

Report information
The Basics
Id: 23439
Status: resolved
Priority: 0/
Queue: HTML-Tree

People
Owner: Nobody in particular
Requestors: nick [...] aevum.de
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.23
Fixed in: (no value)



Subject: as_XML can output invalid attribute names
I use HTML::Tree successfully to convert tag soup to XHTML. There's only one little problem with the as_XML method. It doesn't check whether attributes names conform to the XML spec, so it can output invalid XML. So I have to walk through the tree manually and remove all invalid attributes. It would be nice if this could be done by HTML::Tree. See here for valid XML attribute names: http://www.w3.org/TR/xml/#NT-Name
From: PETEK [...] cpan.org
On Fri Nov 17 11:21:05 2006, nick.aevum.de wrote: Show quoted text
> It doesn't check whether > attributes names conform to the XML spec, so it can output invalid XML.
Could you please provide a sample? I see the spec for attribute names, but seeing an example of broken behavior would help more. HTML::Tree also makes no guarantee at the current time that the XML is valid, but I'll be glad to fix what I can. I am also loathe to implement functionality that causes people to lose data, so any fix will have to incorporate warnings or workarounds to make sure that doesn't happen.
From: nick [...] aevum.de
On Fri Nov 17 13:26:06 2006, PETEK wrote: Show quoted text
> On Fri Nov 17 11:21:05 2006, nick.aevum.de wrote:
> > It doesn't check whether > > attributes names conform to the XML spec, so it can output invalid XML.
> > Could you please provide a sample? I see the spec for attribute names, > but seeing an example of broken behavior would help more.
For example if you have invalid characters in an attribute name: use HTML::Tree; my $tree = HTML::TreeBuilder->new_from_content('<img inval!d="asd">'); print $tree->as_XML(); This produces invalid XML by simply cpoying the attribute name: <img inval!d="asd" /> Show quoted text
> HTML::Tree also makes no guarantee at the current time that the XML is > valid, but I'll be glad to fix what I can. > > I am also loathe to implement functionality that causes people to lose > data, so any fix will have to incorporate warnings or workarounds to > make sure that doesn't happen.
Ideally there should be an option whether to croak if an invalid attribute name occurs, or to simply remove the attribute. I think the least desirable thing is to produce invalid XML, because most likely this will cause problems later on.
A patch to fix this has been applied to the new repo at http://github.com/jfearn/HTML-Tree The patch will cause the parser to die with an appropriate message when it hits invalid attribute name.
Subject: 4.0 released
Hi HTML::Tree ve4rsion 4.0 has been released which includes a fix for this issue. Cheers, Jeff.