Skip Menu |

This queue is for tickets about the URI CPAN distribution.

Report information
The Basics
Id: 50696
Status: open
Priority: 0/
Queue: URI

People
Owner: Nobody in particular
Requestors: mschwern [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 1.40
Fixed in: (no value)



Subject: Option for URI to not escape
I'm using URI to represent IDNA encoded URIs like http://➡.ws/. URI encodes them as http://%E2%9E%A1.ws/. It would be nice if... 1. URI wouldn't escape IDNA domains 2. I could control the character set URI uses to escape 3. I could turn URI escaping off entirely The scalar ref nature makes #2 and #3 difficult. There's no room in the object to add any extra data. I can't implement 2 and 3 without a major rewrite or by using an inside-out object to hold the meta-data. Thoughts? Thanks, Schwern
I think URI should continue to store the fully escaped ASCIIfied form and also continue to stringify by default using that form (ensures that it continue to be RFC 2396 compliant). Further I would like for URIs to automatically convert Unicode hostnames using the IDNA rules and convert them back on request. I started a new branch at http://github.com/gisle/uri/commits/idna to explore and demonstrate. Here $uri objects grow 2 new methods; $uri->as_unicode and $uri->host_unicode. The idea is that the 'as_unicode' will return a URI as unescaped as it can without disturbing the semantics of the URL (that is leave all the reserved escapes unchanged).
Potentially this could also lead to some URI subclass that has the behaviour that the default stringification is using as_unicode instead of as_string.
The unicode conversion functions would be handy. For my purposes, I need the URI to round trip. That is, what goes into new() is what comes out. Escaping on new() and unescaping on output doesn't accomplish that. As an aside, are you married to the scalar ref object? It would open up a lot of possibilities to switch to a hash ref, like caching the parsed URI or storing multiple versions of the data (escaped and unescaped).
On Wed Oct 21 21:22:38 2009, MSCHWERN wrote: Show quoted text
> As an aside, are you married to the scalar ref object? It would open up > a lot of possibilities to switch to a hash ref, like caching the parsed > URI or storing multiple versions of the data (escaped and unescaped).
I do like the scalar ref implementation and I find beauty in the fact that URI objects are basically as cheap as string. URIs are after all just strings with special rules about their syntax and parsing them is cheap in Perl. On the other hand since URI->new() is basically a factory function that delegates to specific implementations based on the scheme I would not mind adding hooks to it so that it could instantiate objects of other classes where the implementation is different.
I've put off reading up on IRIs, but today I read RFC 3987. It appears that what you want is an IRI, not a URI. I want to explore what it would take to support them from the URI dist. I started out by creating a URI::IRI class that wraps an URI. The branch to track this work is now http://github.com/gisle/uri/commits/iri