Bug #32122 for CGI: Changes to CGI::Util method escape breaks compatibility to CGI::Compress::Gzip

Mon Jan 07 04:17:23 2008 dietrich.streifert [...] googlemail.com - Ticket created

Subject:

Changes to CGI::Util method escape breaks compatibility to CGI::Compress::Gzip

We are using CGI and CGI::Compress::Gzip to automatically compress output of html (Indirectly by using CGI::Application and CGI::Application::Plugin::CompressGzip). After updating to from CGI V 3.29 to V 3.33 the output seemed to be created as UTF-8 independent from the charset settings in the file or header. This was for pages adding cookies to the header. After examining the code it turned out that changes in the method escape from $toencode = eval { pack("C*", unpack("U0C*", $toencode))} || pack("C*", unpack("C*", $toencode)); to (change from "C*" to "U*" in the first pack call) $toencode = eval { pack("U*", unpack("U0C*", $toencode))} || pack("C*", unpack("C*", $toencode)); Caused the problem. I don't know if this problem can be solved in this module but it caused the problem. I'll report this also to the CGI::Compress::Gzip module RT. Thank you for your help and your great module. Happy new year.

Mon Jan 07 10:05:21 2008 LDS [...] cpan.org - Taken

Mon Jan 07 10:06:34 2008 LDS [...] cpan.org - Correspondence added

Ouch! Can you give me any more detail on why the Gzip compression is not working? I don't see an obvious dependency between the charset and the gzip module. Lincoln On Mon Jan 07 04:17:23 2008, level420 wrote: Show quoted text

> We are using CGI and CGI::Compress::Gzip to automatically compress > output of html (Indirectly by using CGI::Application and > CGI::Application::Plugin::CompressGzip). > > After updating to from CGI V 3.29 to V 3.33 the output seemed to be > created as UTF-8 independent from the charset settings in the file or > header. This was for pages adding cookies to the header. > > After examining the code it turned out that changes in the method escape > from > > $toencode = eval { pack("C*", unpack("U0C*", $toencode))} || pack("C*", > unpack("C*", $toencode)); > > to (change from "C*" to "U*" in the first pack call) > > $toencode = eval { pack("U*", unpack("U0C*", $toencode))} || pack("C*", > unpack("C*", $toencode)); > > Caused the problem. > > I don't know if this problem can be solved in this module but it caused > the problem. I'll report this also to the CGI::Compress::Gzip module RT. > > Thank you for your help and your great module. Happy new year.

Mon Jan 07 10:06:41 2008 The RT System itself - Status changed from 'new' to 'open'

Mon Jan 07 10:18:22 2008 dietrich.streifert [...] googlemail.com - Correspondence added

From:

dietrich.streifert [...] googlemail.com

On Mo. 07. Jan. 2008, 10:06:34, LDS wrote: Show quoted text

> Ouch! Can you give me any more detail on why the Gzip compression is not > working? I don't see an obvious dependency between the charset and the > gzip module. > > Lincoln >

Thank you for your quick answer Lincoln! Sorry but I can't give you much information why this is happening. I found that with CPR.pm 3.29 everything worked and after an upgrade to CPR.pm 3.33 the problem was there. My pages are encoded in ISO-8859-1 and the accented characters are inserted as is and not as entities. So on the first page visit (tested in IE7 and FF2) instead of the accented characters the typical two byte encoding hyroglyphs started showing up in the page. But only when I was setting a cookie! So I investigated some time in testing other CPR.pm versions and having a look into the cookie handling and found that the escape method from CGI::Util was used to encode the cookie data. After detecting the changes I simply reverted the out-commented pack/unpack line and commented the new one. And voila! It worked again. Maybe CGI::Compress::Gzip has to be updated to work again with the new escape implementation but I have to few knowledge to do this. So I ended up just informing you that there is a problem. So I'm totaly depending here on your help.

Fri Mar 14 09:31:32 2008 dietrich.streifert [...] googlemail.com - Correspondence added

From:

dietrich.streifert [...] googlemail.com

Any news on this subject? Regards...

Fri Mar 14 10:33:15 2008 LDS [...] cpan.org - Correspondence added

Try applying this patch.

? CGI-diff ? CGI.diff ? CGI.patch ? CGI.pm-3.29.tar.gz ? CGI.pm-3.30.tar.gz ? CGI.pm-3.31.tar.gz ? CGI.pm-3.32.tar.gz ? CGI.pm-3.33.tar.gz ? CGI.pm-3.34.tar.gz ? CGI.pm.diff ? Carp.pm.patch ? META.yml ? PUT.patch ? TODO ? attributes.patch ? backout.patch ? d.txt ? post_max_bug.txt ? proposed_diff.patch ? tar.gz ? t/.cvsignore ? t/uploadInfo.t Index: CGI.pm =================================================================== RCS file: /usr/local/cvs_repository/CGI.pm/CGI.pm,v retrieving revision 1.242 retrieving revision 1.247 diff -u -r1.242 -r1.247 --- CGI.pm 27 Dec 2007 18:39:38 -0000 1.242 +++ CGI.pm 14 Mar 2008 14:29:36 -0000 1.247 @@ -18,8 +18,8 @@ # The most recent version and complete docs are available at: # http://stein.cshl.org/WWW/software/CGI/ -$CGI::revision = '$Id: CGI.pm,v 1.242 2007/12/27 18:39:38 lstein Exp $'; -$CGI::VERSION='3.32'; +$CGI::revision = '$Id: CGI.pm,v 1.247 2008/03/14 14:29:36 lstein Exp $'; +$CGI::VERSION='3.34'; # HARD-CODED LOCATION FOR FILE UPLOAD TEMPORARY FILES. # UNCOMMENT THIS ONLY IF YOU KNOW WHAT YOU'RE DOING. @@ -1835,7 +1835,7 @@ my($method,$action,$enctype,@other) = rearrange([METHOD,ACTION,ENCTYPE],@p); - $method = $self->escapeHTML(lc($method) || 'post'); + $method = $self->escapeHTML(lc($method || 'post')); $enctype = $self->escapeHTML($enctype || &URL_ENCODED); if (defined $action) { $action = $self->escapeHTML($action); @@ -2198,9 +2198,11 @@ else { $toencode =~ s{"}{"}gso; } - my $latin = uc $self->{'.charset'} eq 'ISO-8859-1' || - uc $self->{'.charset'} eq 'WINDOWS-1252'; - if ($latin) { # bug in some browsers + # Handle bug in some browsers with Latin charsets + if ($self->{'.charset'} && + (uc($self->{'.charset'}) eq 'ISO-8859-1' || + uc($self->{'.charset'}) eq 'WINDOWS-1252')) + { $toencode =~ s{'}{'}gso; $toencode =~ s{\x8b}{‹}gso; $toencode =~ s{\x9b}{›}gso; @@ -2730,6 +2732,7 @@ $url .= $path if $path_info and defined $path; $url .= "?$query_str" if $query and $query_str ne ''; + $url ||= ''; $url =~ s/([^a-zA-Z0-9_.%;&?\/\\:+=~-])/sprintf("%%%02X",ord($1))/eg; return $url; } @@ -4039,7 +4042,7 @@ my $filename; find_tempdir() unless -w $TMPDIRECTORY; for (my $i = 0; $i < $MAXTRIES; $i++) { - last if ! -f ($filename = sprintf("${TMPDIRECTORY}${SL}CGItemp%d",$sequence++)); + last if ! -f ($filename = sprintf("\%s${SL}CGItemp%d", $TMPDIRECTORY, $sequence++)); } # check that it is a more-or-less valid filename return unless $filename =~ m!^([a-zA-Z0-9_\+ \'\":/.\$\\-]+)$!; @@ -7685,10 +7688,8 @@ =head1 AUTHOR INFORMATION -Copyright 1995-1998, Lincoln D. Stein. All rights reserved. - -This library is free software; you can redistribute it and/or modify -it under the same terms as Perl itself. +The GD.pm interface is copyright 1995-2007, Lincoln D. Stein. It is +distributed under GPL and the Artistic License 2.0. Address bug reports and comments to: lstein@cshl.org. When sending bug reports, please provide the version of CGI.pm, the version of Index: Changes =================================================================== RCS file: /usr/local/cvs_repository/CGI.pm/Changes,v retrieving revision 1.64 retrieving revision 1.68 diff -u -r1.64 -r1.68 --- Changes 27 Dec 2007 18:39:38 -0000 1.64 +++ Changes 14 Mar 2008 14:29:36 -0000 1.68 @@ -1,3 +1,11 @@ + Version 3.34 + 1. Handle Unicode %uXXXX escapes properly -- patch from DANKOGAI@cpan.org + + Version 3.33 + 1. Remove uninit variable warning when calling url(-relative=>1) + 2. Fix uninit variable warnings for two lc calls + 3. Fixed failure of tempfile upload due to sprintf() taint failure in perl 5.10 + Version 3.32 1. Patch from Miguel Santinho to prevent sending premature headers under mod_perl 2.0 Index: CGI/Util.pm =================================================================== RCS file: /usr/local/cvs_repository/CGI.pm/CGI/Util.pm,v retrieving revision 1.26 retrieving revision 1.27 diff -u -r1.26 -r1.27 --- CGI/Util.pm 30 Nov 2007 19:04:04 -0000 1.26 +++ CGI/Util.pm 14 Mar 2008 14:29:37 -0000 1.27 @@ -7,7 +7,7 @@ @EXPORT_OK = qw(rearrange make_attributes unescape escape expires ebcdic2ascii ascii2ebcdic); -$VERSION = '1.5'; +$VERSION = '1.5_01'; $EBCDIC = "\t" ne "\011"; # (ord('^') == 95) for codepage 1047 as on os390, vmesa @@ -141,8 +141,12 @@ sub utf8_chr { my $c = shift(@_); - return chr($c) if $] >= 5.006; - + if ($] >= 5.006){ + require utf8; + my $u = chr($c); + utf8::encode($u); # drop utf8 flag + return $u; + } if ($c < 0x80) { return sprintf("%c", $c); } elsif ($c < 0x800) { @@ -189,6 +193,17 @@ if ($EBCDIC) { $todecode =~ s/%([0-9a-fA-F]{2})/chr $A2E[hex($1)]/ge; } else { + # handle surrogate pairs first -- dankogai + $todecode =~ s{ + %u([Dd][89a-bA-B][0-9a-fA-F]{2}) # hi + %u([Dd][c-fC-F][0-9a-fA-F]{2}) # lo + }{ + utf8_chr( + 0x10000 + + (hex($1) - 0xD800) * 0x400 + + (hex($2) - 0xDC00) + ) + }gex; $todecode =~ s/%(?:([0-9a-fA-F]{2})|u([0-9a-fA-F]{4}))/ defined($1)? chr hex($1) : utf8_chr(hex($2))/ge; } @@ -200,9 +215,12 @@ shift() if @_ > 1 and ( ref($_[0]) || (defined $_[1] && $_[0] eq $CGI::DefaultClass)); my $toencode = shift; return undef unless defined($toencode); + $toencode = eval { pack("C*", unpack("U0C*", $toencode))} || pack("C*", unpack("C*", $toencode)); + # force bytes while preserving backward compatibility -- dankogai -# $toencode = eval { pack("C*", unpack("U0C*", $toencode))} || pack("C*", unpack("C*", $toencode)); - $toencode = eval { pack("U*", unpack("U0C*", $toencode))} || pack("C*", unpack("C*", $toencode)); + # but commented out because it was breaking CGI::Compress -- lstein + # $toencode = eval { pack("U*", unpack("U0C*", $toencode))} || pack("C*", unpack("C*", $toencode)); + if ($EBCDIC) { $toencode=~s/([^a-zA-Z0-9_.~-])/uc sprintf("%%%02x",$E2A[ord($1)])/eg; } else {

Tue Mar 18 10:35:40 2008 dietrich.streifert [...] googlemail.com - Correspondence added

From:

dietrich.streifert [...] googlemail.com

On Fr. 14. Mär. 2008, 10:33:15, LDS wrote: Show quoted text

> Try applying this patch.

Yes! This solves the bug. I stumbled first on the fact that the patch is for CGI 3.32, but after patching the right version the bug is solved. Please publish a new version ASAP. Thank you for your support. Best regards.

Tue Mar 18 12:04:18 2008 LDS [...] cpan.org - Correspondence added

Fixed in 3.34.

Tue Mar 18 12:04:20 2008 LDS [...] cpan.org - Status changed from 'open' to 'resolved'

Fri May 23 14:28:48 2014 The RT System itself - Queue changed from CGI.pm to CGI

Bug #32122 for CGI: Changes to CGI::Util method escape breaks compatibility to CGI::Compress::Gzip

Preferred bug tracker