Bug #4490 for XML-Generator-DBI: UTF-8 determination incorrect?

Subject:

UTF-8 determination incorrect?

This is perl, v5.8.0 built for i386-linux-thread-multi Linux laptop 2.4.20-19.9 #1 Tue Jul 15 17:18:13 EDT 2003 i686 i686 i386 GNU/Linux mysql-3.23.58-1.9 libdbi-0.6.5-5 libdbi-dbd-mysql-0.6.5-5 I'm probably completely insane, and I've missed something completely basic, but here goes. If there is a mysql text field in a table that contains '\n', it results in the field getting the base 64 encoding conversion. From the UTF-8 RFC: http://www.ietf.org/rfc/rfc2279.txt - Character values from 0000 0000 to 0000 007F (US-ASCII repertoire) correspond to octets 00 to 7F (7 bit US-ASCII values). A direct consequence is that a plain ASCII string is also a valid UTF-8 string. Therefore, wouldn't anything up to 7F be a valid UTF-8 string, meaning that it should only be converted if the values are >7F? Can we change line 124 in DBI.pm from: if (defined($_) and /[\x00-\x08\x0A-\x0C\x0E-\x19]/) { to: if (defined($_) and /[\x00-\x08\x0B-\x0C\x0E-\x19]/) { I can't comment on the rest of the restricted characters, they would seem to still be valid UTF-8 characters. The only ones that wouldn't be would be \x80- Regards, Jason Pollock