Bug #8089 for Encode: Encode::utf8::decode_xs does not check partial chars

RT for rt.cpan.org

This queue is for tickets about the Encode CPAN distribution.

Report information

The Basics

Id:	8089
Status:	resolved
Priority:	0/
Queue:	Encode

People

Owner:	DANKOGAI [...] cpan.org
Requestors:	derhoermi [...] gmx.net
Cc:
AdminCc:

Bug Information

Severity:	(no value)
Broken in:	(no value)
Fixed in:	(no value)

History Show all quoted text

Thu Oct 21 15:35:26 2004 derhoermi [...] gmx.net - Ticket created

From:	Bjoern Hoehrmann <derhoermi [...] gmx.net>
To:	bug-Encode [...] rt.cpan.org
Subject:	Encode::utf8::decode_xs does not check partial chars
Date:	Thu, 21 Oct 2004 21:36:14 +0200

Hi, % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" does not work as expected (it should print "Bj\x{FFFD}rn") which is apparently due to Encode::utf8::decode_xs(), the code ... if ((s + skip) > e) { /* Partial character - done */ break; } ... causes the routine to assume that the octets following that "partial" character are well-formed UTF-8, but this should not be assumed as it causes the unexpected behavior above.

Fri Oct 22 00:37:13 2004 DANKOGAI [...] cpan.org - Correspondence added

[derhoermi@gmx.net - Thu Oct 21 15:35:26 2004]: Show quoted text

> % perl -MEncode -e "print decode(q(utf-8), qq(Bj\xF6rn))" > > does not work as expected (it should print "Bj\x{FFFD}rn") which is > apparently due to Encode::utf8::decode_xs(), the code

In this particular case, your expectation is wrong. Try perl -MEncode -le 'print decode(q(iso-latin1), qq(Bj\xF6rn))' and it works as expected. You expect perl treats "Bj\xF6rn" as UTF-8 but perl does not. Perl treats \xHH as iso-latin1. See "Perl's Unicode Model" section of perldoc perluniintro. Dan the Encode Maintainer

Fri Oct 22 00:37:25 2004 DANKOGAI [...] cpan.org - Status changed from 'new' to 'resolved'

Fri Oct 22 15:56:25 2004 DANKOGAI [...] cpan.org - Taken