Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the CGI CPAN distribution.

Report information
The Basics
Id: 67100
Status: resolved
Priority: 0/
Queue: CGI

People
Owner: MARKSTOS [...] cpan.org
Requestors: cpan [...] jibsheet.com
Cc: trs [...] bestpractical.com
AdminCc:

Bug Information
Severity: (no value)
Broken in:
  • 3.51
  • 3.52
  • 3.55
Fixed in: (no value)



CC: trs [...] bestpractical.com
Subject: charset applied to every content-type
Successfully installed CGI.pm-3.50 $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif Successfully installed CGI.pm-3.52 (upgraded from 3.50) $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif; charset=ISO-8859-1 falcone@transom:~$ perl -v This is perl 5, version 12, subversion 3 (v5.12.3) built for darwin-2level serving the ISO-8859-1 on images seems.... wrong. This actually fails for us on some application/javascript which wasn't forcing a charset, and I'll fix that, but I'm not sure why CGI started adding that on between 3.50 and 3.51
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 31 Mar 2011 09:07:11 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Kevin, Thanks for the report. I agree it looks like a mistake to set a charset for a non "text/*" type. Mark
Show quoted text
> This actually fails for us on some application/javascript which wasn't > forcing a charset, and I'll > fix that, but I'm not sure why CGI started adding that on between 3.50 > and 3.51
The reason for the change was documented in "Changes", which referenced the related RT ticket, which in turn referenced HTTP RFCs to justify the change. Here's the entry from "Changes" [BUG FIXES] - Setting charset() now works for all content types, not just "text/*". (RT#57945, Thanks to Yanick and Gerv.) Unless there are other RFC references which counter what we found through the previous ticket, the current behavior will stay. For now I'm marking this change request as "rejected", but feel free to re-open if you find RFC support for reverting the behavior.
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 09:36:02 -0700
To: bug-CGI [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
On 08/15/2012 07:41 PM, MARKSTOS via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=67100 > >
>> This actually fails for us on some application/javascript which >> wasn't forcing a charset, and I'll fix that, but I'm not sure why >> CGI started adding that on between 3.50 and 3.51
> > The reason for the change was documented in "Changes", which > referenced the related RT ticket, which in turn referenced HTTP RFCs > to justify the change. Here's the entry from "Changes" > > [BUG FIXES] - Setting charset() now works for all content types, not > just "text/*". (RT#57945, Thanks to Yanick and Gerv.)
I think there's some confusion here. The bug fix and ticket you mention above is about *allowing* a charset to be sent on non-text/ types. This bug is that CGI.pm is sending a default charset of ISO-8859-1 on *any* content-type even when *no charset* is set by the code. It seems, from first glance, that the change you quote above to always respect the charset has caused the side effect that the default textual charset is always sent, regardless of content-type. By example: tom@whaam ~ $ perl -MV=CGI CGI /opt/perlbrew/perls/perl-5.14.1/lib/5.14.1/CGI.pm: 3.50 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("text/plain");' Content-Type: text/plain; charset=ISO-8859-1 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif tom@whaam ~ $ cpanm -q M/MA/MARKSTOS/CGI.pm-3.51.tar.gz Successfully installed CGI.pm-3.51 1 distribution installed tom@whaam ~ $ perl -MV=CGI CGI /opt/perlbrew/perls/perl-5.14.1/lib/5.14.1/CGI.pm: 3.51 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("text/plain");' Content-Type: text/plain; charset=ISO-8859-1 tom@whaam ~ $ perl -MCGI -le 'my $q = CGI->new; print $q->header("image/gif");' Content-Type: image/gif; charset=ISO-8859-1
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 12:51:41 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Thanks for the clarification, Tom. I see what you are saying. The two bug reports were not opposite of each other as I first thought. As you've clarified, it appears the last change "over-fixed" issue, and it should be dialed back some. Tests and a patch would be welcome. Otherwise, I'll get to it as time permits myself. Mark
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 10:10:06 -0700
To: bug-CGI [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
On 08/16/2012 09:52 AM, mark@summersault.com via RT wrote: Show quoted text
> I see what you are saying. The two bug reports were not opposite of each > other as I first thought. As you've clarified, it appears the last > change "over-fixed" issue, and it should be dialed back some.
Agreed. Show quoted text
> Tests and a patch would be welcome. Otherwise, I'll get to it as time > permits myself.
Here's a couple tests. I don't have time at the moment to track down and fix the bug itself.

Message body is not shown because sender requested not to inline it.

Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Thu, 16 Aug 2012 13:52:33 -0400
To: bug-CGI [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Thanks for the tests! Maybe there's a college CS student out there somewhere who would like to apprentice as a CGI.pm maintainer... Mark
Show quoted text
> >> This actually fails for us on some application/javascript which > >> wasn't forcing a charset, and I'll fix that, but I'm not sure why > >> CGI started adding that on between 3.50 and 3.51
> > > > The reason for the change was documented in "Changes", which > > referenced the related RT ticket, which in turn referenced HTTP RFCs > > to justify the change. Here's the entry from "Changes" > > > > [BUG FIXES] - Setting charset() now works for all content types, not > > just "text/*". (RT#57945, Thanks to Yanick and Gerv.)
> > I think there's some confusion here. The bug fix and ticket you > mention > above is about *allowing* a charset to be sent on non-text/ types. > > This bug is that CGI.pm is sending a default charset of ISO-8859-1 on > *any* content-type even when *no charset* is set by the code. > > It seems, from first glance, that the change you quote above to always > respect the charset has caused the side effect that the default > textual > charset is always sent, regardless of content-type.
I reviewed this again tonight. I also agree with your point that it's a waste of bandwidth to send charset on a image/gif type by default where it doesn't apply. Reading the old ticket one more time, the request then wasn't that the charset be set for every content-type by default, it was that it was *allowed to be changed* via $self->charset() for non "text/*" types. It already worked for non-text types if the charset was supplied in every call to header(). Internally, perhaps the problem is the way the code is structured, it can't tell whether $self->charset() contains a value that is the global default, or one set by the user. Thus, some kind of re-working is required to provide the ideal solution.
Subject: Re: [rt.cpan.org #67100] charset applied to every content-type
Date: Tue, 15 Jan 2013 10:45:58 -0800
To: bug-CGI [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
On 01/11/2013 06:20 PM, MARKSTOS via RT wrote: Show quoted text
> I reviewed this again tonight. > > I also agree with your point that it's a waste of bandwidth to send > charset on a image/gif type by default where it doesn't apply. > > Reading the old ticket one more time, the request then wasn't that the > charset be set for every content-type by default, it was that it was > *allowed to be changed* via $self->charset() for non "text/*" types. It > already worked for non-text types if the charset was supplied in every > call to header().
Yep, that was my understanding as well, and what I was trying to express in my mail you quoted. I'm glad we're on the same page. :) Show quoted text
> Internally, perhaps the problem is the way the code is structured, it > can't tell whether $self->charset() contains a value that is the global > default, or one set by the user. Thus, some kind of re-working is > required to provide the ideal solution.
Correct. Back when this ticket was previously active, I dug into the CGI internals to try and come up with a quick patch. Unfortunately, I came to the same conclusion: the global default is indistinguishable from a user set default, so the entire method of ->charset() handling needs reworking. It's unfortunately not enough to simply compare the current ->charset() to the global default, since the user may have explicitly specified the global default which should be respected.
I think header() should be a subset of HTTP::Headers, and so charset() attribute should be read-only. In CGI.pm, the default value of the Content-Type header is set to "text/html; charset=ISO-8859-1". That's why charset() *returns* "ISO-8859-1" by default. By the way, why is HTTP::Headers::content_type_charset() read-only? In other words, how can we make CGI::charset() writable? In my opinion, charset() attribute of CGI.pm isn't writable. In fact, you can't set charset() in the following situation: my $q = CGI->new; $q->charset( 'utf-8' ); # set print $q->header( -type => 'text/plain; charSet=EUC-JP' ); # [STDOUT] # Content-Type: text/plain; charSet=EUC-JP; charset=utf-8 # In CGI.pm, the Content-Type header is separated into two properties: '-type' and '-charset'. To begin with, I don't think we can't *separate* the Content-Type header into two properties, while I don't know how to fix this problem ;)
This issue has been copied to: https://github.com/leejo/CGI.pm/issues/84 please take all future correspondence there. This ticket will remain open but please do not reply here. This ticket will be closed when the github issue is dealt with.
I've taken the path of least disruption here and amended the perldoc to headers to explain that passing -charset => '' is the workaround to nonsensical charsets. commit 37ad1ed2164635c9ed358493c33ae90c91448bfc Author: Lee Johnson <lee@givengain.ch> Date: Fri Oct 17 15:49:06 2014 +0200 resolve #84 [rt.cpan.org #67100] - document default charset being nonsensical in some situations and the workaround being to pass in -charset => '' in those cases. this is the path of least disruption as there is currently no way to tell if the default charset is one set by the user or the module, and the alternative of checking the content type against the charset is a no go IMO Changes | 7 +++++++ lib/CGI.pm | 8 +++++++- t/headers/type.t | 30 ++++++++++++++++++++++++++++++ 3 files changed, 44 insertions(+), 1 deletion(-)