Bug #59328 for FCGI: FCGI won't accept unicode on output, fails with "Wide character in FCGI::Stream::PRINT"

Tue Jul 13 17:42:48 2010 bitcard [...] cfs.parliant.com - Ticket created

Subject:

FCGI won't accept unicode on output, fails with "Wide character in FCGI::Stream::PRINT"

When using the FCGI module, it seems that all output containing UTF8 data needs to have encode_utf8() called on the strings before calling print(). The attached sample script demonstrates the problem -- it prints a smiley face character twice -- once after calling encode_utf8(), and then again afterwards by directly printing the string with the unicode literal in it. The second smiley does not print, and instead Apache logs this error: [Tue Jul 13 17:28:34 2010] [error] [client 192.168.1.222] FastCGI: server "/usr/local/concom/cgi-bin/fcgiunitest" stderr: Wide character in FCGI::Stream::PRINT at /usr/local/concom/cgi-bin/fcgiunitest line 29. This change seems to have started being a problem in version 0.68_01 where code was added to FCGI.XL like this: #ifdef DO_UTF8 if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1)) croak("Wide character in FCGI::Stream::PRINT"); #endif In previous versions, the Unicode data passes through OK, but now I'm assuming that the sv_utf8_downgrade() call is failing and therefore the script fails. It is bad that all perl strings would need to have encode_utf8() called on them to make them safe for FCGI. It seems that the fix described in the ChangeLog for this is not correct for all cases -- it seems that FCGI should allow print() calls using Unicode strings that were correctly created in normal perl ways. Is it possible that sv_utf8_downgrade() is not supposed to be used in this way? I tried to put in a call to binmode(STDOUT,":encoding(utf8)") but this seems to have no effect (but it does work for CGI or command-line output). Is it possible that FCGI's stream handling is bypassing the perl IO filtering that is enabled by binmode? The test server I'm working with is a current FreeBSD 8.0 with FCGI 0.71 and the perl 5.8.9_3 port. This problem has been verified on machines running both i386 and amd64 versions of FreeBSD on different machines. perl -v reports; This is perl, v5.8.9 built for amd64-freebsd (with 1 registered patch, see perl -V for more detail)

Subject:

fcgiunitest

Download fcgiunitest
application/octet-stream 883b

Message body not shown because it is not plain text.

Wed Jul 14 12:43:05 2010 chansen [...] cpan.org - Correspondence added

Vid Tue, 13 Jul 2010 kl. 17.42.48, skrev csaldanh: Show quoted text

> When using the FCGI module, it seems that all output containing UTF8 > data needs to have encode_utf8() called on the strings before calling > print().

This is true for all Unicode strings in Perl. It's possible to produce Unicode strings without an encoding but you can't interchange them without an encoding. Show quoted text

> The attached sample script demonstrates the problem -- it prints a > smiley face character twice -- once after calling encode_utf8(), and > then again afterwards by directly printing the string with the unicode > literal in it. The second smiley does not print, and instead Apache > logs this error: > > [Tue Jul 13 17:28:34 2010] [error] [client 192.168.1.222] FastCGI: > server "/usr/local/concom/cgi-bin/fcgiunitest" stderr: Wide character in > FCGI::Stream::PRINT at /usr/local/concom/cgi-bin/fcgiunitest line 29. > > > This change seems to have started being a problem in version 0.68_01 > where code was added to FCGI.XL like this: > > #ifdef DO_UTF8 > if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1)) > croak("Wide character in FCGI::Stream::PRINT"); > #endif >

Correct. Show quoted text

> In previous versions, the Unicode data passes through OK, but now I'm > assuming that the sv_utf8_downgrade() call is failing and therefore the > script fails. It is bad that all perl strings would need to have > encode_utf8() called on them to make them safe for FCGI. >

This is incorrect. Previous versions passed perl's internal representation of Unicode (UTF-X) which may or may not be UTF-8 depending on the data and platform. Show quoted text

> > It seems that the fix described in the ChangeLog for this is not correct > for all cases -- it seems that FCGI should allow print() calls using > Unicode strings that were correctly created in normal perl ways. Is it > possible that sv_utf8_downgrade() is not supposed to be used in this way?

Usage of sv_utf8_downgrade() is correct, it attempts to encode the string to octets and will fail if the string contains characters above 0xFF ("Wide character in %s"). Show quoted text

> I tried to put in a call to binmode(STDOUT,":encoding(utf8)") but this > seems to have no effect (but it does work for CGI or command-line > output). Is it possible that FCGI's stream handling is bypassing the > perl IO filtering that is enabled by binmode? >

FCGI.pm uses the TIEHANDLE API for streams, not PerlIO. Show quoted text

> > The test server I'm working with is a current FreeBSD 8.0 with FCGI 0.71 > and the perl 5.8.9_3 port. This problem has been verified on machines > running both i386 and amd64 versions of FreeBSD on different machines. > > perl -v reports; > This is perl, v5.8.9 built for amd64-freebsd > (with 1 registered patch, see perl -V for more detail)

If you want the previous (FCGI.pm <= 0.68) incorrect behavior can disable the exception by using the C<bytes> pragma. { use bytes; print "\x{263A}"; } -- chansen

Wed Jul 14 12:43:05 2010 The RT System itself - Status changed from 'new' to 'open'

Wed Jul 14 12:48:18 2010 bobtfish [...] bobtfish.net - Correspondence added

I'm rejecting this ticket as I don't believe there is anything to fix here. We've also arranged to patch the documentation to make this more clear in subsequent versions. Thanks t0m

Wed Jul 14 12:48:19 2010 bobtfish [...] bobtfish.net - Status changed from 'open' to 'rejected'