Bug #75592 for HTTP-Message: RFC Violation

Tue Mar 06 19:11:51 2012 bbb [...] cpan.org - Ticket created

Subject:

RFC Violation - Broken POST content

Hello Gisle, I accidentally upgraded to HTTP::Message 6.03 and now the urlencoding is suddenly broken for special binary uploads. Rolling back to version 6.02 repairs all the problems. But the new version is corrupting too much!

Tue Mar 06 19:18:32 2012 bbb [...] cpan.org - Correspondence added

I found the problem. HTTP/Request/Common.pm:85: # HTML/4.01 says that line breaks are represented as "CR LF" pairs (i.e., `%0D%0A') HTTP/Request/Common.pm:86: $content =~ s/(?<!%0D)%0A/%0D%0A/g; That's a nifty little negative lookbehind substitution, but "%0A" is not a newline. In fact, it all three characters fit on one line. And modifying already-encoded binary data isn't being helpful at all.

Tue Mar 06 19:18:33 2012 bbb [...] cpan.org - Status changed from 'new' to 'open'

Tue Mar 06 19:26:45 2012 bbb [...] cpan.org - Correspondence added

Here is the patch we are using in production that corrects the problem.

Subject:

http-message-603_01.patch

--- /usr/local/lib/perl5/site_perl/5.8.8/HTTP/Request/Common.pm.BROKE 2012-02-16 16:14:03.000000000 -0600 +++ /usr/local/lib/perl5/site_perl/5.8.8/HTTP/Request/Common.pm 2012-03-06 18:21:42.000000000 -0600 @@ -82,8 +82,8 @@ $url->query_form(ref($content) eq "HASH" ? %$content : @$content); $content = $url->query; - # HTML/4.01 says that line breaks are represented as "CR LF" pairs (i.e., `%0D%0A') - $content =~ s/(?<!%0D)%0A/%0D%0A/g; + # HTML/4.01 says that line breaks are represented as "CR LF" pairs + $content =~ s/(?<!\x0d)\x0a/\x0d\x0a/g; } }

Tue Mar 06 19:26:45 2012 bbb [...] cpan.org - Status changed from 'open' to 'patched'

Sun Mar 11 07:23:03 2012 GAAS [...] cpan.org - Correspondence added

Since the new behaviour is to encode "\n" as "\r\n" for application/x−www−form−urlencoded parts this is clearly not transparant if you try to pass through binary data. I can see that. The new behaviour only make sense to text content (where you emulate an HTML form). I don't know what you mean by RFC Violation. The patch that introduced this behaviour claimed the the old behaviour conflicted with the HTTP RFC. What RFC do you claim say otherwise? Binary parts would normally be handled by using the multipart/form−data format instead. Your patch is broken. It breaks the test suite and works for you simply be effectively disabling the statement it affects. Thus reverting to the old behaviour. I think it ought to be possible use this interface and get transparent binary encoding as well. Not quite sure how yet.

Sun Mar 11 07:23:07 2012 The RT System itself - Status changed from 'patched' to 'open'

Sun Mar 11 22:48:58 2012 bbb [...] cpan.org - Correspondence added

Gisle, Thank you for your response! I understand that HTTP RFC 2068 suggests that CRLF be used across the wire to fully comply. And I believe it might even be appropriate to convert multipart/form−data fields (that happen to be text/*) to CRLF format so that everything across the raw transport is standard. But there is nothing in RFC 1738 that suggests you can inject extra characters during the encoding process for application/x−www−form−urlencoded data. And "%0A" is just three digits in plain text without any special characters so should be safe across the wire as is, thus the violation referred to in my report. However, I have seen SOME real browsers such as IE encode textarea inputs with forced CRLF prior to encoding, which I believe your 6.03 will help emulate this behavior. I'm not sure what any RFC states about that or where that idea even comes from or if that is even proper or not. And if this behavior I've seen is proper, then yes, my patch would break that. Unfortunately, I'm not sure how to tell LWP::UserAgent or HTTP::Request when I know that my content is not text and should not be corrupted in this case. I just know that everything used to work fine for years posting binary inputs, and suddenly after upgrading to 6.03 it is broken. I guess I can double URL encode it on the client end and then double decode it on the server end to workaround any encoding issues with HTTP::Request, but that would require changes on both ends and doesn't provide a slow migration path very easily. I'm just not sure what the best solution is. And which RFC are you claiming that the OLD behavior does not comply? Or do you have any other suggestions?

Sat Mar 31 00:52:51 2012 DOHERTY [...] cpan.org - Correspondence added

This has caused a module of mine to malfunction under some circumstances. We calculate an md5 of the text we're sending to the server to ensure it isn't harmed in transit. Well, with 6.03 it is - the text the server receives no longer matches the md5. I've attached a transcript of a shell session showing how I determined that this release is responsible. I don't know whether an RFC is violated, but I do know this change is causing problems.

Subject:

http-message-test.log

Message body is not shown because it is too large.

Tue Apr 24 16:22:30 2012 DOHERTY [...] cpan.org - Correspondence added

Looking at http://www.w3.org/Protocols/rfc2616/rfc2616- sec14.html#sec14.15 I see: "Conversion of all line breaks to CRLF MUST NOT be done" in reference to calculating a Content-MD5 header, which suggested to me that messing with my line endings was probably a bad idea. Then, I saw: "HTTP allows transmission of text types with any of several line break conventions and not just the canonical form using CRLF." which prompted me to look at http://www.w3.org/Protocols/rfc2616/rfc2616- sec3.html#sec3.7.1 which is more explicit. I'm certainly an amateur at reading RFCs, but it looks like you're not required to convert text types in the body to CRLF. If that's the case, please consider reverting this change.

Wed Jan 16 13:40:52 2013 ether [...] cpan.org - Cc ETHER added

Tue Aug 13 11:20:13 2013 EUGENEK [...] cpan.org - Correspondence added

Plus one to reverting conversion. Not only prepending every 0A with 0D completely breaks binary data, it is also not clear that "x-www-form-urlencoded" content-type is not suited for it, as many cache servers and existing communications depend on it. Actually, any silent alteration (and this is alteration, not some sort of transfer-encoding - server gets data different from what was sent by client) of user data seems very unnatural to me, not to mention breaking compatibility with previous versions. Issue warning (it is at most; I'm not sure that you got whole idea right).

Mon Jan 13 03:27:50 2014 GFILATOV [...] cpan.org - Correspondence added

Any progress here?

Thu Mar 30 20:58:50 2017 olaf [...] wundersolutions.com - Correspondence added

Ticket migrated to github as https://github.com/libwww-perl/HTTP-Message/issues/69

Thu Mar 30 20:58:56 2017 olaf [...] wundersolutions.com - Status changed from 'open' to 'resolved'

Bug #75592 for HTTP-Message: RFC Violation - Broken POST content