Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 51204
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: GAAS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.37
Fixed in: (no value)



Subject: Callback CHECK not supported for UTF-8 decoder/encoder
I was surprised how much time I ended up spending debugging why I could not get: Encode::decode("UTF-8", $octets, sub { sprintf "%%%02X", shift }) to work. My program behaved very oddly and the behaviour changed in random ways. Turns out Encode just sliently turn the CODE address into an integer and treat that as flags to decide how to behave. Very confusing. The least that should happen is that the decoder croaks when a CODE reference is passed in, but I would be much happier if such callbacks could simply be made to work.
Patch to make Encode::decode("UTF-8", $bytes, sub {}) croak.
From 32ca48a2d4d46aa42855266f4e37da8a7342a92d Mon Sep 17 00:00:00 2001 From: Gisle Aas <gisle@aas.no> Date: Sun, 8 Nov 2009 12:15:58 +0100 Subject: [PATCH] Make the UTF8 encoder croak when callback CHECK is passed in --- Encode.xs | 22 ++++++++++++++++++---- 1 files changed, 18 insertions(+), 4 deletions(-) diff --git a/Encode.xs b/Encode.xs index e5f4c9a..e9ccd3f 100644 --- a/Encode.xs +++ b/Encode.xs @@ -401,19 +401,26 @@ MODULE = Encode PACKAGE = Encode::utf8 PREFIX = Method_ PROTOTYPES: DISABLE void -Method_decode_xs(obj,src,check = 0) +Method_decode_xs(obj,src,check_sv = &PL_sv_no) SV * obj SV * src -int check +SV * check_sv PREINIT: STRLEN slen; U8 *s; U8 *e; SV *dst; bool renewed = 0; + int check; CODE: { dSP; ENTER; SAVETMPS; + if (SvROK(check_sv)) { + croak("UTF-8 decoder doesn't support callback CHECK"); + } + else { + check = SvIV(check_sv); + } if (src == &PL_sv_undef) src = newSV(0); s = (U8 *) SvPV(src, slen); e = (U8 *) SvEND(src); @@ -464,18 +471,25 @@ CODE: } void -Method_encode_xs(obj,src,check = 0) +Method_encode_xs(obj,src,check_sv = &PL_sv_no) SV * obj SV * src -int check +SV * check_sv PREINIT: STRLEN slen; U8 *s; U8 *e; SV *dst; bool renewed = 0; + int check; CODE: { + if (SvROK(check_sv)) { + croak("UTF-8 encoder doesn't support callback CHECK"); + } + else { + check = SvIV(check_sv); + } if (src == &PL_sv_undef) src = newSV(0); s = (U8 *) SvPV(src, slen); e = (U8 *) SvEND(src); -- 1.6.2.95.g934f7
Thanks, applied in my repo. VERSION++ soon. Dan the Maintainer Thereof On Sun Nov 08 06:18:02 2009, GAAS wrote: Show quoted text
> Patch to make Encode::decode("UTF-8", $bytes, sub {}) croak.
Hi, the ticket seems closed though. Some of my modules expected that this wouldn't croak but be simply ignored. I would be much happier if such callbacks could work, too. :-) I tested under Encode.pm 2.23: perl -MEncode -e 'print Encode::decode("UTF-8", "\x80", Encode::FB_XMLCREF), "\n"' &#x80; => CHECK values works as single ¥x80 octets is malformed in UTF-8. I guess this is one of cases that CHECK is needed for decode(). perl -MEncode -e 'print Encode::decode("UTF-8", "\x80", sub { sprintf "%%%02X", shift }), "\n"' &#128; => CHECK coderef is ignored and doesn't work. I think this should work however. This invokes "UTF-8 decoder doesn't support callback CHECK" under Encode 2.38. Thank you both for maintaining the great module anyway. Ref: http://kawa.at.webry.info/200911/article_12.html
Fixed in Version 2.39. Thanks for your insight. Indeed, it's useful on decode. For encode, it has no use since it always succeeds. Dan the Encode Maintainer On Mon Nov 23 18:17:14 2009, KAWASAKI wrote: Show quoted text
> Hi, the ticket seems closed though. > > Some of my modules expected that this wouldn't croak but be simply > ignored. > I would be much happier if such callbacks could work, too. :-) > > I tested under Encode.pm 2.23: > > perl -MEncode -e 'print Encode::decode("UTF-8", "\x80", > Encode::FB_XMLCREF), "\n"' > &#x80; > > => CHECK values works as single ¥x80 octets is malformed in UTF-8. > I guess this is one of cases that CHECK is needed for decode(). > > perl -MEncode -e 'print Encode::decode("UTF-8", "\x80", sub { sprintf > "%%%02X", shift }), > "\n"' > &#128; > > => CHECK coderef is ignored and doesn't work. I think this should work > however. > This invokes "UTF-8 decoder doesn't support callback CHECK" under > Encode 2.38. > > Thank you both for maintaining the great module anyway. > > Ref: > http://kawa.at.webry.info/200911/article_12.html