Subject: | UTF-8 settings and binary file upload |
Date: | Sat, 19 May 2012 01:28:27 +0200 (CEST) |
To: | bug-CGI [...] rt.cpan.org |
From: | Saašha Metsärantala <saasha [...] acc.umu.se> |
Hello!
Working in an all-UTF-8 text environment, the only thing usually needed to
make perl work fine (when it comes to character encoding) is to begin the
script file with:
#!/usr/bin/perl -TwC63
use utf8;
and everything is (almost always) fine! This applies to CGI, too ... as
long as you do not need to let people upload binary files (and also send
some UTF-8 text that needs to be processed properly, too!). I consider
that what happens then is not really well documented in the POD. I made
some tests and I consider that what I (unsurprisingly) found could be
documented in the next version of the POD, because I assume that it could
be useful to many users of the CGI module.
What I found is that:
(1) The first line needs to be changed to:
#!/usr/bin/perl -TwC62
In other words, "C63" must to be changed to "C62".
(2) If the uploaded binary file is fetched to the script through the
upload() method then, when this file is copied, OUTFILE must be in
binmode.
(3) If the uploaded binary file is fetched to the script through the
tmpFileName() method then, when this file is copied, both INFILE and
OUTFILE must be in binmode.
In other words:
egrep -h 'while|binmode|^$' successfully_tested_script* | grep binmode -A 2
# binmode $filehandle; # upload()
binmode INFILE; # tmpFileName()
binmode OUTFILE;
while ( <INFILE> ) {
# while ( <$filehandle> ) {
--
# binmode $filehandle; # upload()
# binmode INFILE; # tmpFileName()
binmode OUTFILE;
# while ( <INFILE> ) {
while ( <$filehandle> ) {
--
binmode $filehandle; # upload()
# binmode INFILE; # tmpFileName()
binmode OUTFILE;
# while ( <INFILE> ) {
while ( <$filehandle> ) {
Of course, there is some logic behind all that ... I won't write details
about that here: the perlunicode and perlrun manual pages provide details.
My message is that I consider that this behavior should be clarified in
the POD of the CGI module.
Regards!
Saašha,