Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the CGI CPAN distribution.

Report information
The Basics
Id: 77297
Status: rejected
Priority: 0/
Queue: CGI

People
Owner: MARKSTOS [...] cpan.org
Requestors: saasha [...] acc.umu.se
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: UTF-8 settings and binary file upload
Date: Sat, 19 May 2012 01:28:27 +0200 (CEST)
To: bug-CGI [...] rt.cpan.org
From: Saašha Metsärantala <saasha [...] acc.umu.se>
Hello! Working in an all-UTF-8 text environment, the only thing usually needed to make perl work fine (when it comes to character encoding) is to begin the script file with: #!/usr/bin/perl -TwC63 use utf8; and everything is (almost always) fine! This applies to CGI, too ... as long as you do not need to let people upload binary files (and also send some UTF-8 text that needs to be processed properly, too!). I consider that what happens then is not really well documented in the POD. I made some tests and I consider that what I (unsurprisingly) found could be documented in the next version of the POD, because I assume that it could be useful to many users of the CGI module. What I found is that: (1) The first line needs to be changed to: #!/usr/bin/perl -TwC62 In other words, "C63" must to be changed to "C62". (2) If the uploaded binary file is fetched to the script through the upload() method then, when this file is copied, OUTFILE must be in binmode. (3) If the uploaded binary file is fetched to the script through the tmpFileName() method then, when this file is copied, both INFILE and OUTFILE must be in binmode. In other words: egrep -h 'while|binmode|^$' successfully_tested_script* | grep binmode -A 2 # binmode $filehandle; # upload() binmode INFILE; # tmpFileName() binmode OUTFILE; while ( <INFILE> ) { # while ( <$filehandle> ) { -- # binmode $filehandle; # upload() # binmode INFILE; # tmpFileName() binmode OUTFILE; # while ( <INFILE> ) { while ( <$filehandle> ) { -- binmode $filehandle; # upload() # binmode INFILE; # tmpFileName() binmode OUTFILE; # while ( <INFILE> ) { while ( <$filehandle> ) { Of course, there is some logic behind all that ... I won't write details about that here: the perlunicode and perlrun manual pages provide details. My message is that I consider that this behavior should be clarified in the POD of the CGI module. Regards! Saašha,
Hello, Thanks for the feedback. Could you find someone else familiar with the UTF-8 issue to peer-review your suggestion and leave a comment here?
Subject: Re: [rt.cpan.org #77297] UTF-8 settings and binary file upload
Date: Sun, 16 Sep 2012 16:41:04 +0200 (CEST)
To: bug-CGI [...] rt.cpan.org, mark [...] summersault.com
From: Saašha Metsärantala <saasha [...] acc.umu.se>
Hello! Thanks for your e-mail about the UTF-8 and CGI issue. I send you a small, simple cgi-file, which seems to work. It is GPL and Artistic-licensed - feel free to use it and rework it! It may be useful for the following issue: Reading http://search.cpan.org/~markstos/CGI.pm-3.60/lib/CGI.pm today, I noticed that neither the synopsis not the "complete example of a simple form-based script" takes UTF-8 into account. Most people surfing the net use characters outside ASCII and almost no perl coder want to neglect the majority of people surfing the net. Including UTF-8 awareness in the synopsis is therefore something I consider crucial also because it shows that perl is able to solve these problems really well. The file I send you may help you to achieve that. It is written in English, but it contains some UTF-8 encoded mdash characters for UTF-8 testing purposes. Non-ASCII characters may also be written in the "season". Of course, the file should be sent with the HTTP header: Content-Type: application/xhtml+xml; charset=UTF-8 Show quoted text
> someone else familiar with the UTF-8 issue to peer-review
Surprisingly few people seem to be acquainted with both UTF-8 and CGI. I wonder which code you would like to be rewiewed. Maybe I can help ... Regards! Saašha,

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #77297] UTF-8 settings and binary file upload
Date: Mon, 17 Sep 2012 09:27:08 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Saašha, Thanks for the contribution. Mark
This issue has been copied to: https://github.com/leejo/CGI.pm/issues/98 please take all future correspondence there. This ticket will remain open but please do not reply here. This ticket will be closed when the github issue is dealt with.
Thanks, but i'm not going to duplicate documentation in CGI.pm for binmode and utf-8 / perlunicode / perlrun / etc. If you're working with binary files then it should be self evident that you need to consider the input/output file handling.