Bug #46283 for CGI: Needs proposed fix: header() forces hyphenated HTTP headers to non-canonical forms

Thu May 21 12:49:37 2009 ohiocore [...] gmail.com - Ticket created

Subject:

header() forces hyphenated HTTP headers to non-canonical forms

Tested on debian linux 2.6.18-4-686 w/ perl 5.8.8 $cgi->header() forces hyphenated header fields like "Content-Style-Type" to ucfirst(lc $term), but these are not the canonical forms. Registered HTTP headers: http://www.ietf.org/rfc/rfc4229.txt Example problem: perl -e 'use CGI; my $x=CGI->new; my $z= {charset=>"UTF-8", "Content-Style-Type"=>"css"}; print $x->header($z);' Outputs: Content-style-type: css Content-Type: text/html; charset=UTF-8 This results in validation errors by tools like (Firefox plugin) TotalValidator. Compare to the output using HTTP::Headers: perl -e 'use HTTP::Headers; my $h = HTTP::Headers->new("Content-Type"=>"text/html; charset=UTF-8", "Content-Style-Type"=>"css"); print $h->as_string, "\n";' Outputs: Content-Type: text/html; charset=UTF-8 Content-Style-Type: css The line at fault in CGI.pm 3.43 seems to be #1486: ($_ = $header) =~ s/^(\w)(.*)/"\u$1\L$2" . ': '.$self->unescapeHTML($value)/e;

Sun Jul 19 20:50:35 2009 MARKSTOS [...] cpan.org - Taken

Sun Jul 19 20:50:40 2009 MARKSTOS [...] cpan.org - Status changed from 'new' to 'open'

Sun Jul 19 20:51:18 2009 MARKSTOS [...] cpan.org - Subject changed from 'header() forces hyphenated HTTP headers to non-canonical forms' to 'Needs proposed fix: header() forces hyphenated HTTP headers to non-canonical forms'

Sun Jul 19 20:57:26 2009 MARKSTOS [...] cpan.org - Correspondence added

On Thu May 21 12:49:37 2009, atz2 wrote: Show quoted text

> Tested on debian linux 2.6.18-4-686 w/ perl 5.8.8 > > $cgi->header() forces hyphenated header fields like "Content-Style-Type" > to ucfirst(lc $term), but these are not the canonical forms. > > Registered HTTP headers: > http://www.ietf.org/rfc/rfc4229.txt > > Example problem: > > perl -e 'use CGI; my $x=CGI->new; my $z= {charset=>"UTF-8", > "Content-Style-Type"=>"css"}; print $x->header($z);' > > Outputs: > Content-style-type: css > Content-Type: text/html; charset=UTF-8 > > > This results in validation errors by tools like (Firefox plugin) > TotalValidator. Compare to the output using HTTP::Headers: > > perl -e 'use HTTP::Headers; my $h = > HTTP::Headers->new("Content-Type"=>"text/html; charset=UTF-8", > "Content-Style-Type"=>"css"); print $h->as_string, "\n";' > > Outputs: > Content-Type: text/html; charset=UTF-8 > Content-Style-Type: css > > > The line at fault in CGI.pm 3.43 seems to be #1486: > ($_ = $header) =~ s/^(\w)(.*)/"\u$1\L$2" . ': > '.$self->unescapeHTML($value)/e;

Thanks for the report. I agree the current situation is not ideal. What do you propose as a fix? The best solution would take into account several things: - The historical behavior, where people be loose with the case of their headers. - The best practices you point out, which generally uppercases just the first letter after a hypen, but has some exceptions, like "SubOK", "DAV", "C-PEP", and "SoapAction" - The existing documentation for this behavior, which reads as follows: "Any other named parameters will be stripped of their initial hyphens and turned into header fields, allowing you to specify any HTTP header you desire. Internal underscores will be turned into hyphens" The documentation is helpfully vague on the implementation here, allowing us to change the implementation while still remaining true to the documentation. Thanks for your help with this! Mark

Sun Jul 19 22:59:01 2009 ohiocore [...] gmail.com - Correspondence added

Subject:	Proposed fix: header() forces hyphenated HTTP headers to non-canonical forms
From:	ohiocore [...] gmail.com

Show quoted text

> Thanks for the report. > > I agree the current situation is not ideal. What do you propose as a > fix? The best solution would take into account several things: > > - The historical behavior, where people be loose with the case of their > headers. > - The best practices you point out, which generally uppercases just the > first letter after a hypen, but has some exceptions, like "SubOK", > "DAV", "C-PEP", and "SoapAction" > - The existing documentation for this behavior, which reads as follows: > > "Any other named parameters will be stripped of their initial hyphens > and turned into header fields, allowing you to specify any HTTP header > you desire. Internal underscores will be turned into hyphens" > > The documentation is helpfully vague on the implementation here, > allowing us to change the implementation while still remaining true to > the documentation. > > Thanks for your help with this!

There are 133 total headers in the spec (116 permanent and 17 provisional). The options are: (1) Essentially a dictionary of regexps mapping to canonical forms. It would be undesirable to check as many as 133 regexps for each header. The number could be reduced by batching things like the Accpet-* headers together, but that still seems too burdensome. (2) Push the task to a specialized module like HTTP::Headers. This solves the problem for some headers, but FAILs on others and does not implement our requirement regarding leading hyphens. Example: perl -we 'use HTTP::Headers; my $h =HTTP::Headers->new("Content-Type"=>"text/html; charset=UTF-8", "Content-style-type"=>"css", "c-pep"=>"bar", -something_new=>"foo"); print $h->as_string, "\n";' Outputs: Content-Type: text/html; charset=UTF-8 -Something-New: foo C-Pep: bar Content-Style-Type: css We would want C-PEP and Something-New in the output. (3) Screen special cases and push the rest to HTTP::Headers. This seems like a waste since we would be doing much of the processing ourselves. (4) Screen special cases and pass the rest for rule-bound transformation. I think this makes the most sense. 29 of 133 headers have uppercase letters that appear "out of place", i.e. not following a hyphen. A-IM C-PEP C-PEP-Info Content-ID Content-MD5 DAV Differential-ID ETag GetProfile IM MIME-Version P3P PEP PICS-Label ProfileObject SetProfile SoapAction Status-URI TCN TE URI WWW-Authenticate Message-ID SubOK UA-Color UA-Media UA-Pixels UA-Resolution UA-Windowpixels We can catch 11 of those 29 w/ 3 regexps like: s/^UA-/UA-/i; s/\bID$/ID/i; s/\bPEP\b/PEP/i; Or a combined 23 of 29 with: s/\b(UA|ID|IM|PEP|P3P|WWW|URI|MD5|DAV|PICS|MIME|TE|TCN)\b/\u($1)/ei; That leaves 6. Note, regexp could be tuned a bit for performance, essentially branching with stuff like ...|P(ICS|[E3]P)|... Matching part of regexp is eligible for compile-once flag also. Those six headers this I think we will just need to check for explicitly: ETag GetProfile ProfileObject SetProfile SoapAction SubOK Then everything else can follow the ucfirst after hyphen rule. If that sounds feasible, I can attempt a patch. --Joe

Tue Jul 21 20:48:38 2009 MARKSTOS [...] cpan.org - Correspondence added

Show quoted text

> The options are: > > (1) Essentially a dictionary of regexps mapping to canonical forms. It > would be undesirable to check as many as 133 regexps for each header. > The number could be reduced by batching things like the Accpet-* headers > together, but that still seems too burdensome. > > (2) Push the task to a specialized module like HTTP::Headers. This > solves the problem for some headers, but FAILs on others and does not > implement our requirement regarding leading hyphens.

Thanks the prompt attention and thought. I've looked into this further now myself. While the RFC you linked to has a list of specs, the RFC that covers the case-sensitivity of the headers in the HTTP RFC mentioned in the footnotes, RFC 2616: http://www.ietf.org/rfc/rfc2616.txt Section 4.2 on Message Headers has the relevant bit: "Field names are case-insensitive." So, it is the Firefox TotalValidator plugin that has a bug here. It is not following the spec by claiming it is a bug to have a different case here. If there were a report of actual browser problem, that would be interesting, but it would still technically be a bug in the browser. Still, I don't like that people can give us headers in the Canonical form, and we would be change them-- as a matter of style if not correctness. What I suggest instead if that we do the simple thing of leaving the case of headers alone. - This will do what many people expect - It's simple-- it doesn't get us a cycle of maintaining code related to what the cuurrent HTTP headers are. (And let's not forget the "X-*" name space for headers, which is constantly evolving and not standardized). - If people give us all lowercase or something, that's apparently still valid HTTP, even if it's a different behavior. How does that address your original concern, Joe? Mark

Tue Jul 21 21:39:48 2009 MARKSTOS [...] cpan.org - Broken in 3.16 deleted

Tue Jul 21 21:39:48 2009 MARKSTOS [...] cpan.org - Broken in 3.17 deleted

Tue Jul 21 21:39:48 2009 MARKSTOS [...] cpan.org - Broken in 3.19 deleted

Tue Jul 21 21:39:48 2009 MARKSTOS [...] cpan.org - Broken in 3.20 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.21 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.22 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.23 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.25 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.27 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.28 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.29 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.31 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.32 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.33 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.34 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.35 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.37 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.38 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.39 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.40 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.41 deleted

Tue Jul 21 21:39:49 2009 MARKSTOS [...] cpan.org - Broken in 3.42 deleted

Tue Jul 21 22:50:48 2009 ohiocore [...] gmail.com - Correspondence added

Show quoted text

> I've looked into this further now myself. While the RFC you linked to > has a list of specs, the RFC that covers the case-sensitivity of the > headers in the HTTP RFC mentioned in the footnotes, RFC 2616: > > http://www.ietf.org/rfc/rfc2616.txt > > Section 4.2 on Message Headers has the relevant bit: > > "Field names are case-insensitive." > > So, it is the Firefox TotalValidator plugin that has a bug here. It is > not following the spec by claiming it is a bug to have a different case > here. If there were a report of actual browser problem, that would be > interesting, but it would still technically be a bug in the browser. > > Still, I don't like that people can give us headers in the Canonical > form, and we would be change them-- as a matter of style if not > correctness. > > What I suggest instead if that we do the simple thing of leaving the > case of headers alone. > > - This will do what many people expect > - It's simple-- it doesn't get us a cycle of maintaining code related to > what the current HTTP headers are. (And let's not forget the "X-*" > name space for headers, which is constantly evolving and not > standardized). > - If people give us all lowercase or something, that's apparently still > valid HTTP, even if it's a different behavior. > > How does that address your original concern, Joe? > > Mark >

Interesting. Thanks for the additional research. I agree with your reading. Header case is not required by spec, and TotalValidator is off. The only mitigating information comes from the same paragraph you quote: 'Applications ought to follow "common form", where one is known or indicated, when generating HTTP constructs, since there might exist some implementations that fail to accept anything beyond the common forms.' So there is still some pressure to use the "most canonical" forms if possible, which I take to be those in the spec I referenced. At this point I would defer on whether to change CGI.pm at all, since the technical basis for my bug is removed. I am wary of core changes just for the sake of "good style", in part because they are hard to defend if something goes wrong. Perhaps the Internet is ready for it now, but I remember quite painfully case-sensitive behavior in proprietary cache servers years ago. I do like the idea of "you get what you put into it" though. Anyway, you've answered my question, so it's "resolved" AFAIC. Thanks, --Joe

Mon Jan 21 21:01:06 2013 MARKSTOS [...] cpan.org - Correspondence added

Show quoted text

> At this point I would defer on whether to change CGI.pm at all, since > the technical basis for my bug is removed. I am wary of core changes > just for the sake of "good style", in part because they are hard to > defend if something goes wrong.

I agree. At this point, I'm ready to conside this a design-flaw in CGI.pm of questionable pragmatic value to change at this point, given the wide deployment and existing expectations about the current behavior. There are plent of other header generation options for Perl out there now, and I would direct them to use those options if they find the CGI.pm behavior problematic. Mark

Mon Jan 21 21:01:08 2013 MARKSTOS [...] cpan.org - Status changed from 'open' to 'resolved'

Fri May 23 14:29:31 2014 The RT System itself - Queue changed from CGI.pm to CGI

Bug #46283 for CGI: Needs proposed fix: header() forces hyphenated HTTP headers to non-canonical forms

Preferred bug tracker