Bug #3921 for CGI: The utf8 bit is not set when CGI.pm reads the query string or STDIN

Wed Oct 01 07:08:04 2003 Guest - Ticket created

Subject:

The utf8 bit is not set when CGI.pm reads the query string or STDIN

Hi, The utf8 bit is not set when CGI.pm reads the query string or STDIN (which is UTF-8 encoded) and set the params, the effect of this is that the query parameter gets double encoded (e.g. when concatenated with an utf8 string with the bit set). Find attached example. Thanks for Your help, Michael Nilsson -- #!/usr/bin/perl -w use strict; use 5.008_000; use CGI (); ## set the layer of STDOUT to UTF-8 binmode(STDOUT, ':utf8'); ## enable UTF-8 in source code use utf8; my $query = CGI->new(); $query->charset('UTF-8'); print $query->header(), $query->start_html(), 'FÃ¤rger', $query->start_form(-method => 'POST', -action => $query->url()), $query->textarea(-name => 'text', -default => 'starting value FÃ¤rger'), $query->submit(), $query->endform(), $query->end_html(); -- Adding this code after creating the CGI instance fixes the problem -- use Encode (); foreach my $param ($query->param()) { foreach my $i (0 .. $#{$query->{$param}}) { ## Messing with Perl's Internals, ## this is probably not the right way to do this Encode::_utf8_on($query->{$param}->[$i]); } }

Thu Mar 25 07:37:16 2004 JMASTROS [...] cpan.org - Correspondence added

Similar problem, with the same cause: foo.pl?x=%FF%u00FF $q->param('x') now gets a string that is, as far as perl is concerned, is 3 characters long, "\xFF\xC3\xBF". That's not what anybody intended, and is quite useless. utf8_chr should have a line, near the top, return chr($c) if ($] >= 5.006). This will properly set the utf8 bit, and upgrade any existing data. It will make the above case simply Do The Right Thing. However, on the purticular site that brings this up, perlmonks.org, it will not solve the problem we're having, since we expect our data to be latin-1, not utf8, and it matters to us, since we often round-trip to the database. We need to entitify on input -- that is, have %uXXXX be converted to &#xXXXX; (or &#DDDDD, of course). That's not a good solution for most sites, though it is for ours -- so an option would be appreciated, but I don't mind doing a local patch.

Mon Dec 13 10:52:31 2004 SREZIC [...] cpan.org - Correspondence added

From:

srezic [...] cpan.org

Just want to say "me too". I use this hack in my scripts (I don't need support for older perl version, so I left the version checking bit): { use CGI::Util; package CGI::Util; *utf8_chr = sub ($) { chr($_[0]); }; } Regards, Slaven

Mon Mar 07 14:54:18 2005 LDS [...] cpan.org - Given to LDS

Mon Mar 07 14:54:18 2005 LDS [...] cpan.org - Status changed from 'new' to 'resolved'

Fri May 23 14:28:22 2014 The RT System itself - Queue changed from CGI.pm to CGI

Bug #3921 for CGI: The utf8 bit is not set when CGI.pm reads the query string or STDIN

Preferred bug tracker