Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the CGI CPAN distribution.

Report information
The Basics
Id: 3921
Status: resolved
Worked: 1 min
Priority: 0/
Queue: CGI

People
Owner: LDS [...] cpan.org
Requestors: michael.nilsson [...] athega.se
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: The utf8 bit is not set when CGI.pm reads the query string or STDIN
Hi, The utf8 bit is not set when CGI.pm reads the query string or STDIN (which is UTF-8 encoded) and set the params, the effect of this is that the query parameter gets double encoded (e.g. when concatenated with an utf8 string with the bit set). Find attached example. Thanks for Your help, Michael Nilsson -- #!/usr/bin/perl -w use strict; use 5.008_000; use CGI (); ## set the layer of STDOUT to UTF-8 binmode(STDOUT, ':utf8'); ## enable UTF-8 in source code use utf8; my $query = CGI->new(); $query->charset('UTF-8'); print $query->header(), $query->start_html(), 'Färger', $query->start_form(-method => 'POST', -action => $query->url()), $query->textarea(-name => 'text', -default => 'starting value Färger'), $query->submit(), $query->endform(), $query->end_html(); -- Adding this code after creating the CGI instance fixes the problem -- use Encode (); foreach my $param ($query->param()) { foreach my $i (0 .. $#{$query->{$param}}) { ## Messing with Perl's Internals, ## this is probably not the right way to do this Encode::_utf8_on($query->{$param}->[$i]); } }
Similar problem, with the same cause: foo.pl?x=%FF%u00FF $q->param('x') now gets a string that is, as far as perl is concerned, is 3 characters long, "\xFF\xC3\xBF". That's not what anybody intended, and is quite useless. utf8_chr should have a line, near the top, return chr($c) if ($] >= 5.006). This will properly set the utf8 bit, and upgrade any existing data. It will make the above case simply Do The Right Thing. However, on the purticular site that brings this up, perlmonks.org, it will not solve the problem we're having, since we expect our data to be latin-1, not utf8, and it matters to us, since we often round-trip to the database. We need to entitify on input -- that is, have %uXXXX be converted to &#xXXXX; (or &#DDDDD, of course). That's not a good solution for most sites, though it is for ours -- so an option would be appreciated, but I don't mind doing a local patch.
From: srezic [...] cpan.org
Just want to say "me too". I use this hack in my scripts (I don't need support for older perl version, so I left the version checking bit): { use CGI::Util; package CGI::Util; *utf8_chr = sub ($) { chr($_[0]); }; } Regards, Slaven