Subject: | UTF-8 determination incorrect? |
This is perl, v5.8.0 built for i386-linux-thread-multi
Linux laptop 2.4.20-19.9 #1 Tue Jul 15 17:18:13 EDT 2003 i686 i686 i386 GNU/Linux
mysql-3.23.58-1.9
libdbi-0.6.5-5
libdbi-dbd-mysql-0.6.5-5
I'm probably completely insane, and I've missed something completely basic, but here goes.
If there is a mysql text field in a table that contains '\n', it results in the field getting the base 64 encoding conversion.
From the UTF-8 RFC:
http://www.ietf.org/rfc/rfc2279.txt
- Character values from 0000 0000 to 0000 007F (US-ASCII repertoire)
correspond to octets 00 to 7F (7 bit US-ASCII values). A direct
consequence is that a plain ASCII string is also a valid UTF-8
string.
Therefore, wouldn't anything up to 7F be a valid UTF-8 string, meaning that it should only be converted if the values are >7F?
Can we change line 124 in DBI.pm from:
if (defined($_) and /[\x00-\x08\x0A-\x0C\x0E-\x19]/) {
to:
if (defined($_) and /[\x00-\x08\x0B-\x0C\x0E-\x19]/) {
I can't comment on the rest of the restricted characters, they would seem to still be valid UTF-8 characters. The only ones that wouldn't be would be \x80-
Regards,
Jason Pollock