Bug #54341 for CGI: CGI.pm - misleading documentation

Fri Feb 05 09:21:58 2010 Helmut.Richter [...] lrz.de - Ticket created

Subject:	CGI.pm - misleading documentation
Date:	Fri, 5 Feb 2010 15:20:16 +0100 (CET)
To:	bug-CGI.pm [...] rt.cpan.org
From:	Helmut Richter <Helmut.Richter [...] lrz.de>

Hello, this is not a report on a bug in CGI.pm (in fact it works perfectly although the documentation warns against a very useful feature!) but in its documentation. If this is not the right address to send such comments, please forward. Mit besten Grüßen / Best regards Helmut Richter ==================================================== Dr. Helmut Richter Leibniz-Rechenzentrum Tel: +49-89-35831-8785 Boltzmannstraße 1 Fax: +49-89-35831-9700 85748 Garching / Germany ==================================================== Problem ------- The documentation as found in http://search.cpan.org/dist/CGI.pm/lib/CGI.pm says about the -utf8 pragma: | -utf8 | | This makes CGI.pm treat all parameters as UTF-8 strings. Use this with care, | as it will interfere with the processing of binary uploads. It is better to | manually select which fields are expected to return utf-8 strings and | convert them using code like this: | | | use Encode; | my $arg = decode utf8=>param('foo'); I have the following qualms with it: 1. It is not at all obvious what exactly is meant with "treat all parameters as UTF-8 strings", or what the consequences for the user of CGI.pm are. The term "UTF-8 string" could mean "binary string containing UTF-8 encoded data"; this is not meant (and it is very fortunate that this is not what happens). 2. It is not so that it "interferes with the processing of binary uploads". Quite the contrary: it is a special feature of the -utf8 pragma that parameters are decoded from UTF-8 *without* interfering with binary uploads (I guess by first extracting the binary data and decoding only the remaining text). At least, I was not able to get any errors into binary uploads by using the -utf8 pragma which did a correct decoding of the input form data without touching the binary upload data. 3. The unnecessary work-around in the last line is *not* a functional substitute for the effect of the -utf8 pragma. If one uses it, one has still to keep track of the encoding of parameters used as defaults, e.g. textfield(-name=>'field_name', -value=>'starting value', -size=>50, -maxlength=>80); will only work if the string for starting value is ASCII, otherwise it must be replaced by "encode ('utf8', 'starting value')". Also, comparing input parameter values with constants can only be done after proper decoding. All this complicated and error-prone wizardry is unnecessary when using the -utf8 pragma. There is no reason to warn against it. Again: there is no need to modify the implementation of CGI.pm -- it does exactly what is needed. Only the documenation must be updated to tell the user what CGI.pm really does. Suggested new wording --------------------- -utf8 This makes CGI.pm treat all parameters as text strings rather than binary strings (see *perlunitut* for the distinction), assuming UTF-8 for the encoding of input/output from/to the form. This is typically used in conjunction with a <form> tag containing the option 'accept-charset="UTF-8"' to ensure UTF-8 input from the form and with 'binmode (STDOUT, ":utf8")' to ensure UTF-8 output to the form, while all handling of the data within the perl script manipulates only text strings. CGI.pm does the decoding from the UTF-8 encoded input data, restricting this decoding to input text as distinct from binary upload data which are left untouched. Therefore, a ':utf8' layer must *not* be used on STDIN.

Thu May 22 08:07:22 2014 LEEJO [...] cpan.org - Correspondence added

This issue has been copied to: https://github.com/leejo/CGI.pm/issues/69 please take all future correspondence there. This ticket will remain open but please do not reply here. This ticket will be closed when the github issue is dealt with.

Thu May 22 08:07:22 2014 The RT System itself - Status changed from 'new' to 'open'

Fri May 23 14:28:05 2014 The RT System itself - Queue changed from CGI.pm to CGI

Sun Jul 13 07:26:34 2014 LEEJO [...] cpan.org - Correspondence added

commit 6c73a27ebcbe9f71497b8180150f7bd339a8b485 Author: Lee Johnson <lee@givengain.ch> Date: Sun Jul 13 13:25:05 2014 +0200 resolve #69 [rt.cpan.org #54341] - -utf8 perldoc tweaks to make is clear what is going on and what you should[ not] do w/r/t encoding/decoding

Sun Jul 13 07:26:35 2014 LEEJO [...] cpan.org - Status changed from 'open' to 'resolved'

Bug #54341 for CGI: CGI.pm - misleading documentation

Preferred bug tracker