Skip Menu |

This queue is for tickets about the Regexp-Common CPAN distribution.

Report information
The Basics
Id: 52309
Status: resolved
Priority: 0/
Queue: Regexp-Common

People
Owner: Nobody in particular
Requestors: SAMV [...] cpan.org
Cc: sam [...] vilain.net
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.122
Fixed in: (no value)



CC: sam [...] vilain.net
Subject: URI rules rejects many example URIs from rfc3986, and hangs on news: example
Silly XML Schema tends to use URNs, like "urn:oasis:names:specification:docbook:dtd:xml:4.1.2". In principle, 'scheme:opaque' matches an awful lot of rubbish that probably isn't a valid URI, but "urn" is probably worth adding. I checked a few of the ones in section 1.1.2 of RFC3986, and also found that the example 'ldap' and 'mailto' URIs also failed to validate, and the 'news' one never returned! To reproduce: perl -MRegexp::Common=URI -le '(print("$_:".(m{$RE{URI}}?"ok":"fail"))) for qw(ftp://ftp.is.co.za/rfc/rfc1808.txt http://www.ietf.org/rfc/rfc2396.txt ldap://[2001:db8::7]/c=GB?objectClass?one mailto:John.Doe@example.com tel:+1-816-555-1212 telnet://192.0.2.16:80/ urn:oasis:names:specification:docbook:dtd:xml:4.1.2 news:comp.infosystems.www.servers.unix )'
On Tue Dec 01 01:53:44 2009, SAMV wrote:
Show quoted text
> Silly XML Schema tends to use URNs, like
> "urn:oasis:names:specification:docbook:dtd:xml:4.1.2". In principle,
> 'scheme:opaque' matches an awful lot of rubbish that probably isn't a
> valid URI, but "urn" is probably worth adding.
>
> I checked a few of the ones in section 1.1.2 of RFC3986, and also found
> that the example 'ldap' and 'mailto' URIs also failed to validate, and
> the 'news' one never returned!
>
> To reproduce:
>
> perl -MRegexp::Common=URI -le '(print("$_:".(m{$RE{URI}}?"ok":"fail"))) for
> qw(ftp://ftp.is.co.za/rfc/rfc1808.txt
> http://www.ietf.org/rfc/rfc2396.txt
> ldap://[2001:db8::7]/c=GB?objectClass?one
> mailto:John.Doe@example.com
> tel:+1-816-555-1212
> telnet://192.0.2.16:80/
> urn:oasis:names:specification:docbook:dtd:xml:4.1.2
> news:comp.infosystems.www.servers.unix
> )'

The 'ldap', 'mailto' and 'urn' schemes aren't supported. It's unlikely they ever will in Regexp::Common - but they maybe in its next generation (Regexp::Common510 - not yet released).
The reason the pattern 'hangs' was that it took a long time for regexp engine to determine a certain branch was unsuccesfull. The pattern has been rewritten to fail faster (at the expensive of matching somewhat slower). This should be available in the next release (planned to occur this year)