Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 8089
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: DANKOGAI [...] cpan.org
Requestors: derhoermi [...] gmx.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



From: Bjoern Hoehrmann <derhoermi [...] gmx.net>
To: bug-Encode [...] rt.cpan.org
Subject: Encode::utf8::decode_xs does not check partial chars
Date: Thu, 21 Oct 2004 21:36:14 +0200
Hi, % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" does not work as expected (it should print "Bj\x{FFFD}rn") which is apparently due to Encode::utf8::decode_xs(), the code ... if ((s + skip) > e) { /* Partial character - done */ break; } ... causes the routine to assume that the octets following that "partial" character are well-formed UTF-8, but this should not be assumed as it causes the unexpected behavior above.
[derhoermi@gmx.net - Thu Oct 21 15:35:26 2004]: Show quoted text
> % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" > > does not work as expected (it should print "Bj\x{FFFD}rn") which is > apparently due to Encode::utf8::decode_xs(), the code
In this particular case, your expectation is wrong. Try perl -MEncode -le 'print decode(q(iso-latin1), qq(Bj\xF6rn))' and it works as expected. You expect perl treats "Bj\xF6rn" as UTF-8 but perl does not. Perl treats \xHH as iso-latin1. See "Perl's Unicode Model" section of perldoc perluniintro. Dan the Encode Maintainer