Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 34873
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: MARKOV [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.66
Fixed in: (no value)



Subject: patterns and charsets
For XML::Compile, I would like to improve the schema pattern implementation. XML has slightly different character-classes from the Unicode table (although I haven't been able to find a clear documents about the translation yet) XML::RegExp does define the character-sets, but it is extremely expensive this way. libxml2 does (very probably) implement the correct classes. What I need is a simple "match string against schema pattern". I couldn't find it in the XML::LibXML docs. Is there any access to the lib which I can use?
Subject: Re: [rt.cpan.org #34873] patterns and charsets
Date: Sat, 12 Apr 2008 13:06:07 +0200
To: bug-XML-LibXML [...] rt.cpan.org
From: Christian Glahn <christian.glahn [...] lo-f.at>
Hi Mark, In prior XML::LibXML development I did not consider the character ranges as a problem, as the translation form XML to internal data structures is completely performed by the library. I just checked the libxml2's web site and there is a helper library that performs per character checking. However, I do not completely understand what you are exactly looking for. Is it character checking or Schema checking? Maybe you could provide an example of your ideas. You may want to have a look at the character validation API and explore if this part does what you want. http://xmlsoft.org/html/libxml-chvalid.html Another option is the Schema API, which has three parts: http://xmlsoft.org/html/libxml-xmlschemas.html http://xmlsoft.org/html/libxml-xmlschemastypes.html http://xmlsoft.org/html/libxml-schemasInternals.html Finally, there is a pattern API, which might help. http://xmlsoft.org/html/libxml-pattern.html I don't think that drafting an interface to the character validation API should be too much of a problem. For the other APIs it requires a closer look. Christian On Fri, 2008-04-11 at 08:34 -0400, Mark Overmeer via RT wrote: Show quoted text
> Fri Apr 11 08:34:39 2008: Request 34873 was acted upon. > Transaction: Ticket created by MARKOV > Queue: XML-LibXML > Subject: patterns and charsets > Broken in: 1.66 > Severity: Wishlist > Owner: Nobody > Requestors: MARKOV@cpan.org > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=34873 > > > > For XML::Compile, I would like to improve the schema pattern implementation. > > XML has slightly different character-classes from the Unicode table > (although I haven't been able to find a clear documents about the > translation yet) XML::RegExp does define the character-sets, but it is > extremely expensive this way. > > libxml2 does (very probably) implement the correct classes. What I need > is a simple "match string against schema pattern". I couldn't find it > in the XML::LibXML docs. Is there any access to the lib which I can use?
Subject: Re: [rt.cpan.org #34873] patterns and charsets
Date: Sat, 12 Apr 2008 15:06:35 +0200
To: Christian Glahn via RT <bug-XML-LibXML [...] rt.cpan.org>
From: NLnet webmaster <webmaster [...] nlnet.nl>
* Christian Glahn via RT (bug-XML-LibXML@rt.cpan.org) [080412 11:06]: XML::Compile does validation (optional) under fly, when parts of the final message are under construction. At that time, I would like to be able to check the validity of schema simpleType values which have a restriction facet "pattern". I tried to translate the pattern into a regex, but XML does not use unicode classes to define what it accepts. XML::RegExp does define the codes explicitly, but using those will be far too slow. So, I would really like to be able to validate against libxml2. Show quoted text
> I don't think that drafting an interface to the character validation API > should be too much of a problem. For the other APIs it requires a closer > look.
Judgin by name, it looks like xmlPatterncompile and xmlPatternMatch would do the job. Although I do not know whether character check patterns are the same as XPath patterns. Something along the lines of: use XML::LibXML::Common qw/compile_pattern pattern_matches/ my $pat = compile_pattern XSSEL => '\c'; if(pattern_matches $pat, $token) {...} However, I do expect the patterns the show-up in the Schema Facet check page... http://xmlsoft.org/html/libxml-xmlschemastypes.html Today, I have no time to read the C source to find where the schema implementation check patterns... We do not need interfaces to the other facets, because they are simple in regexes. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Please have a look at XML::LibXML::Pattern in the SVN (or wait for 1.67 which should appear on CPAN within a week). It may (or may not) do what you want. -- petr
Subject: Re: [rt.cpan.org #34873] patterns and charsets
Date: Mon, 3 Nov 2008 14:19:56 +0100
To: Petr Pajas via RT <bug-XML-LibXML [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Petr Pajas via RT (bug-XML-LibXML@rt.cpan.org) [081102 23:53]: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=34873 > > Please have a look at XML::LibXML::Pattern in the SVN (or wait for 1.67 > which should appear on CPAN within a week). It may (or may not) do what > you want.
I have checked it out, and it seems it will suffice. Thanks! (Glad to see that there are new developments, after many months of silence) -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
It is a pity, but the names are confusing. The pattern facets have nothing to do with XPath patterns. In LibXML2, they are handle by the automata http://xmlsoft.org/html/libxml-xmlautomata.html implemented in ~/xmlregexp.c Because the name-classes do not match the perl-regexps, I really would like access to these routines for performance reasons.
Dne po 24.lis.2008 15:29:00, MARKOV napsal(a): Show quoted text
> It is a pity, but the names are confusing. The pattern facets have > nothing to do with XPath patterns. In LibXML2, they are handle by > the automata > http://xmlsoft.org/html/libxml-xmlautomata.html > implemented in > ~/xmlregexp.c > > Because the name-classes do not match the perl-regexps, I really
would Show quoted text
> like access to these routines for performance reasons.
ok, I'm not going to create interface to all functions in xmlautomata. Please be more specific, what method exactly do you want (or submit a patch). -- Petr
Subject: Re: [rt.cpan.org #34873] patterns and charsets
Date: Sat, 24 Jan 2009 21:38:56 +0100
To: Petr Pajas via RT <bug-XML-LibXML [...] rt.cpan.org>
From: Mark Overmeer <mark [...] overmeer.net>
* Petr Pajas via RT (bug-XML-LibXML@rt.cpan.org) [090123 19:13]: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=34873 > > > Dne po 24.lis.2008 15:29:00, MARKOV napsal(a):
> > It is a pity, but the names are confusing. The pattern facets have > > nothing to do with XPath patterns. In LibXML2, they are handle by > > the automata > > http://xmlsoft.org/html/libxml-xmlautomata.html > > implemented in > > ~/xmlregexp.c > > > > Because the name-classes do not match the perl-regexps, I really > > would like access to these routines for performance reasons.
> > ok, I'm not going to create interface to all functions in xmlautomata. > Please be more specific, what method exactly do you want (or submit a > patch).
Probably, we only need my $r = xmlAutomataCompile $pattern; Because as far as I understand it, it behaves like a regex after compilation. But I must say, I have not studied libxml2 in sufficient extend. I hope you do not challenge me to do it :-( -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Hi, I've added a new module XML::LibXML::RegExp based on xmlregexp.c. It provides a constructor which compiles a regexp into an automaton and two methods: $bool = $re->match($string) # xmlRegexpEval and $bool = $re->isDeterministic(); # libxml2 provides this method and I guess it can be useful for identifying deterministic content models; don't know of any other application libxml2 seems to be using xmlRegexpCompile and xmlRegexpEval for testing XSD pattern facets, therefore I hope this will close this request. Please test with your module. Also, you are welcome to contribute test cases. -- Petr Dne so 24.led.2009 20:31:44, Mark@Overmeer.net napsal(a): Show quoted text
> * Petr Pajas via RT (bug-XML-LibXML@rt.cpan.org) [090123 19:13]:
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=34873 > > > > > Dne po 24.lis.2008 15:29:00, MARKOV napsal(a):
> > > It is a pity, but the names are confusing. The pattern facets have > > > nothing to do with XPath patterns. In LibXML2, they are handle by > > > the automata > > > http://xmlsoft.org/html/libxml-xmlautomata.html > > > implemented in > > > ~/xmlregexp.c > > > > > > Because the name-classes do not match the perl-regexps, I really > > > would like access to these routines for performance reasons.
> > > > ok, I'm not going to create interface to all functions in xmlautomata. > > Please be more specific, what method exactly do you want (or submit a > > patch).
> > Probably, we only need > my $r = xmlAutomataCompile $pattern; > > Because as far as I understand it, it behaves like a regex after > compilation. But I must say, I have not studied libxml2 in sufficient > extend. I hope you do not challenge me to do it :-(