CC: | paulo.matos [...] fct.unl.pt |
Subject: | URI mailto not correctly encoded according to rfc2368 |
Date: | Tue, 13 Feb 2007 21:15:15 +0000 (WET) |
To: | bug-URI [...] rt.cpan.org |
From: | Paulo Matos <paulo.matos [...] fct.unl.pt> |
RFC2368 (http://www.ietf.org/rfc/rfc2368), number 2, says:
" (...)
8-bit characters in mailto URLs are forbidden. MIME encoded words (as
defined in [RFC2047]) are permitted in header values, but not for any
part of a "body" hname.
"
When using headers with non-ascii characters, e.g.:
To: João Góis <joao.gois@example.com>
URI behaves like:
# perl -MURI -e '$u=URI->new("João Góis <joao.gois\@example.com>", "mailto"); print $u->as_string."\n";'
mailto:Jo%E3o%20G%F3is%20%3Cjoao.gois@example.com%3E
This is "URL-encoded" (aka "%-encoded") which is correct for HTML
interpretation, but according to what is stated on rfc2368 it should be
first MIME encoded, and if needed URL-encoded afterwards.
And why? Because you loose charset information! %-encoding will probably
work when charset information is coherent.
I also noticed that "," is not encoded as %2C, but this seems to be only a
suggestion not something mandatory.
Regards,
--
Paulo Matos