Skip Menu |

This queue is for tickets about the Class-XPath CPAN distribution.

Report information
The Basics
Id: 7322
Status: open
Priority: 0/
Queue: Class-XPath

People
Owner: Nobody in particular
Requestors: cpan [...] timaoutloud.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



The current version of Class::XPath (1.4) does properly quote / characters in matches. Also, its allowable characterset is not as inclusive as it should be. The following patch seems to address these issues. 8a9,10 Show quoted text
> use Text::ParseWords; >
12c14 < our $NAME = qr/[\w:]+/; --- Show quoted text
> our $NAME = qr/[[:alpha:]_][\w:\-\.]*/;
122c124 < my @parts = split('/', $xpath); --- Show quoted text
> my @parts = quotewords("/",1,$xpath);
Show quoted text
> < our $NAME = qr/[\w:]+/; > ---
> > our $NAME = qr/[[:alpha:]_][\w:\-\.]*/;
I don't think this is quite right. perlre says \w is "alphanumeric plus '_'" and http://www.w3.org/TR/REC-xml/#NT-NameChar says it wants alphanumeric, '.' , '-', '_', ':', CombiningChar, and Extender. Including CombiningChar seems to mean adding \X. Including the Unicode Extender class seems to mean \p{Extender}. So, the better value for $NAME here would be: our $NAME = qr{[-._:[:alnum:](?:\PM\pM*)\p{Extender}\p{Ideographic}]*}; (Just noticed that Ideographic is included in the XML spec and I'm not a Unicode Guru, so I don't know if that is included under the alnums, so I tacked it on the end.) No wonder they say you shouldn't write XML parsers using regular expressions.
On Thu Feb 08 15:54:34 2007, MAHEX wrote: Show quoted text
> Including CombiningChar seems to mean adding \X.
... Show quoted text
> our $NAME = qr{[-._:[:alnum:](?:\PM\pM*)\p{Extender}\p{Ideographic}]*};
Since it isn't clear what happened to \X: perl complained when I put it in, so I substituted (?:\PM\pM*) which perlre claims is the equivalent.