Bug #50696 for URI: Option for URI to not escape

Wed Oct 21 00:25:17 2009 mschwern [...] cpan.org - Ticket created

Subject:

Option for URI to not escape

I'm using URI to represent IDNA encoded URIs like http://➡.ws/. URI encodes them as http://%E2%9E%A1.ws/. It would be nice if... 1. URI wouldn't escape IDNA domains 2. I could control the character set URI uses to escape 3. I could turn URI escaping off entirely The scalar ref nature makes #2 and #3 difficult. There's no room in the object to add any extra data. I can't implement 2 and 3 without a major rewrite or by using an inside-out object to hold the meta-data. Thoughts? Thanks, Schwern

Wed Oct 21 15:06:22 2009 GAAS [...] cpan.org - Correspondence added

I think URI should continue to store the fully escaped ASCIIfied form and also continue to stringify by default using that form (ensures that it continue to be RFC 2396 compliant). Further I would like for URIs to automatically convert Unicode hostnames using the IDNA rules and convert them back on request. I started a new branch at http://github.com/gisle/uri/commits/idna to explore and demonstrate. Here $uri objects grow 2 new methods; $uri->as_unicode and $uri->host_unicode. The idea is that the 'as_unicode' will return a URI as unescaped as it can without disturbing the semantics of the URL (that is leave all the reserved escapes unchanged).

Wed Oct 21 15:06:24 2009 The RT System itself - Status changed from 'new' to 'open'

Wed Oct 21 15:11:24 2009 GAAS [...] cpan.org - Correspondence added

Potentially this could also lead to some URI subclass that has the behaviour that the default stringification is using as_unicode instead of as_string.

Wed Oct 21 21:22:38 2009 mschwern [...] cpan.org - Correspondence added

The unicode conversion functions would be handy. For my purposes, I need the URI to round trip. That is, what goes into new() is what comes out. Escaping on new() and unescaping on output doesn't accomplish that. As an aside, are you married to the scalar ref object? It would open up a lot of possibilities to switch to a hash ref, like caching the parsed URI or storing multiple versions of the data (escaped and unescaped).

Sun Oct 25 07:02:12 2009 GAAS [...] cpan.org - Correspondence added

On Wed Oct 21 21:22:38 2009, MSCHWERN wrote: Show quoted text

> As an aside, are you married to the scalar ref object? It would open up > a lot of possibilities to switch to a hash ref, like caching the parsed > URI or storing multiple versions of the data (escaped and unescaped).

I do like the scalar ref implementation and I find beauty in the fact that URI objects are basically as cheap as string. URIs are after all just strings with special rules about their syntax and parsing them is cheap in Perl. On the other hand since URI->new() is basically a factory function that delegates to specific implementations based on the scheme I would not mind adding hooks to it so that it could instantiate objects of other classes where the implementation is different.

Sun Oct 25 13:41:59 2009 GAAS [...] cpan.org - Correspondence added

I've put off reading up on IRIs, but today I read RFC 3987. It appears that what you want is an IRI, not a URI. I want to explore what it would take to support them from the URI dist. I started out by creating a URI::IRI class that wraps an URI. The branch to track this work is now http://github.com/gisle/uri/commits/iri