Skip Menu |

This queue is for tickets about the FCGI CPAN distribution.

Report information
The Basics
Id: 59328
Status: rejected
Priority: 0/
Queue: FCGI

People
Owner: Nobody in particular
Requestors: bitcard [...] cfs.parliant.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.68_01
Fixed in: (no value)



Subject: FCGI won't accept unicode on output, fails with "Wide character in FCGI::Stream::PRINT"
When using the FCGI module, it seems that all output containing UTF8 data needs to have encode_utf8() called on the strings before calling print(). The attached sample script demonstrates the problem -- it prints a smiley face character twice -- once after calling encode_utf8(), and then again afterwards by directly printing the string with the unicode literal in it. The second smiley does not print, and instead Apache logs this error: [Tue Jul 13 17:28:34 2010] [error] [client 192.168.1.222] FastCGI: server "/usr/local/concom/cgi-bin/fcgiunitest" stderr: Wide character in FCGI::Stream::PRINT at /usr/local/concom/cgi-bin/fcgiunitest line 29. This change seems to have started being a problem in version 0.68_01 where code was added to FCGI.XL like this: #ifdef DO_UTF8 if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1)) croak("Wide character in FCGI::Stream::PRINT"); #endif In previous versions, the Unicode data passes through OK, but now I'm assuming that the sv_utf8_downgrade() call is failing and therefore the script fails. It is bad that all perl strings would need to have encode_utf8() called on them to make them safe for FCGI. It seems that the fix described in the ChangeLog for this is not correct for all cases -- it seems that FCGI should allow print() calls using Unicode strings that were correctly created in normal perl ways. Is it possible that sv_utf8_downgrade() is not supposed to be used in this way? I tried to put in a call to binmode(STDOUT,":encoding(utf8)") but this seems to have no effect (but it does work for CGI or command-line output). Is it possible that FCGI's stream handling is bypassing the perl IO filtering that is enabled by binmode? The test server I'm working with is a current FreeBSD 8.0 with FCGI 0.71 and the perl 5.8.9_3 port. This problem has been verified on machines running both i386 and amd64 versions of FreeBSD on different machines. perl -v reports; This is perl, v5.8.9 built for amd64-freebsd (with 1 registered patch, see perl -V for more detail)
Subject: fcgiunitest
Download fcgiunitest
application/octet-stream 883b

Message body not shown because it is not plain text.

Vid Tue, 13 Jul 2010 kl. 17.42.48, skrev csaldanh: Show quoted text
> When using the FCGI module, it seems that all output containing UTF8 > data needs to have encode_utf8() called on the strings before calling > print().
This is true for all Unicode strings in Perl. It's possible to produce Unicode strings without an encoding but you can't interchange them without an encoding. Show quoted text
> The attached sample script demonstrates the problem -- it prints a > smiley face character twice -- once after calling encode_utf8(), and > then again afterwards by directly printing the string with the unicode > literal in it. The second smiley does not print, and instead Apache > logs this error: > > [Tue Jul 13 17:28:34 2010] [error] [client 192.168.1.222] FastCGI: > server "/usr/local/concom/cgi-bin/fcgiunitest" stderr: Wide character in > FCGI::Stream::PRINT at /usr/local/concom/cgi-bin/fcgiunitest line 29. > > > This change seems to have started being a problem in version 0.68_01 > where code was added to FCGI.XL like this: > > #ifdef DO_UTF8 > if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1)) > croak("Wide character in FCGI::Stream::PRINT"); > #endif >
Correct. Show quoted text
> In previous versions, the Unicode data passes through OK, but now I'm > assuming that the sv_utf8_downgrade() call is failing and therefore the > script fails. It is bad that all perl strings would need to have > encode_utf8() called on them to make them safe for FCGI. >
This is incorrect. Previous versions passed perl's internal representation of Unicode (UTF-X) which may or may not be UTF-8 depending on the data and platform. Show quoted text
> > It seems that the fix described in the ChangeLog for this is not correct > for all cases -- it seems that FCGI should allow print() calls using > Unicode strings that were correctly created in normal perl ways. Is it > possible that sv_utf8_downgrade() is not supposed to be used in this way?
Usage of sv_utf8_downgrade() is correct, it attempts to encode the string to octets and will fail if the string contains characters above 0xFF ("Wide character in %s"). Show quoted text
> I tried to put in a call to binmode(STDOUT,":encoding(utf8)") but this > seems to have no effect (but it does work for CGI or command-line > output). Is it possible that FCGI's stream handling is bypassing the > perl IO filtering that is enabled by binmode? >
FCGI.pm uses the TIEHANDLE API for streams, not PerlIO. Show quoted text
> > The test server I'm working with is a current FreeBSD 8.0 with FCGI 0.71 > and the perl 5.8.9_3 port. This problem has been verified on machines > running both i386 and amd64 versions of FreeBSD on different machines. > > perl -v reports; > This is perl, v5.8.9 built for amd64-freebsd > (with 1 registered patch, see perl -V for more detail)
If you want the previous (FCGI.pm <= 0.68) incorrect behavior can disable the exception by using the C<bytes> pragma. { use bytes; print "\x{263A}"; } -- chansen
I'm rejecting this ticket as I don't believe there is anything to fix here. We've also arranged to patch the documentation to make this more clear in subsequent versions. Thanks t0m