Bug #13025 for libwww-perl: parsing bug in HTTP::Message::parse()

Mon May 30 19:20:21 2005 Guest - Ticket created

Subject:

parsing bug in HTTP::Message::parse()

Hi! I've stumbled across a bug in multi-part messages in HTTP::Message. If the message content contains something that looks like a header, it accidentally gets detected as a header. However, a blank line after a header should signal the end of headers, and below it, the start of content as per the spec. It just looks like your regex needs some tweeking, or maybe do something like this: sub parse { my($class, $str) = @_; my ($headPart,$contentPart) = split(/\r?\n\r?\n/,$str,2); my @hdr; while ($headPart =~ s/^([^ \t:]+)[ \t]*: ?(.*)\n?//) { push(@hdr, $1, $2); } new($class, \@hdr, $contentPart); } I've included a sample program to reproduce the problem so you can see better what I'm trying to say. If you look at the Dumper() output for the part "3fc90c2f106c4dc7e7742d6bb12b68f7" you'll see that it got detected incorrectly. Let me know if there is anything i can do to help.

#!/usr/bin/perl -w use strict; use HTTP::Message; use HTTP::Headers; use Data::Dumper; my $content = '------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="fp720eb62d5af0f1d87412b385533c196c" 1 ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="trid720eb62d5af0f1d87412b385533c196c" 779758 ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="673bb732f4790c8cc3a63e5489bdb25b"; filename="" ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="3fc90c2f106c4dc7e7742d6bb12b68f7" some data ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="42a6fca55664038ee398f7ceb14e4835" aoeu:aoeu ------------0xKhTmLbOuNdArY-- '; my $headers = HTTP::Headers->new; $headers->header( 'content-type' => 'multipart/form-data; boundary=----------0xKhTmLbOuNdArY' ); my $m = HTTP::Message->new($headers,$content); foreach my $part ($m->parts) { print STDERR Dumper($part); }

Sat Aug 13 22:55:10 2005 jbisbee [...] cpan.org - Correspondence added

[guest - Mon May 30 19:20:21 2005]: Show quoted text

> Hi! > > I've stumbled across a bug in multi-part messages in HTTP::Message. > If the message content contains something that looks like a header, > it accidentally gets detected as a header. However, a blank line > after a header should signal the end of headers, and below it, the > start of content as per the spec. It just looks like your regex > needs some tweeking, or maybe do something like this:

Found the same problem today... :( #!/usr/bin/perl use strict; use warnings; use HTTP::Message; use Data::Dumper; my $bad = qq(Content-Disposition: form-data; name="fetch_url"\n) . qq(\nhttp://www.jbisbee.com/\n); my $part = HTTP::Message->parse($bad); warn Dumper($part); With the output... $VAR1 = bless( { '_content' => '', '_headers' => bless( { 'content-disposition' => 'form-data; name="fetch_url"', ' ' http' => '//www.jbisbee.com/' }, 'HTTP::Headers' ) }, 'HTTP::Message' ); of course this only happens with forms that specify enctype="multipart/form-data" for uploads so it may be a bit hard to reproduce

Tue Nov 15 10:54:41 2005 Guest - Correspondence added

From:

Julien Gaulmin <julien23 [...] free.fr>

I don't know if we have _exactly_ the same problem but I did have an error while trying to parse some gzip encoded HTTP message. I made a patch that works fine for me but posted it to the wrong list. Here was the original report: Show quoted text

> The parse() function of the HTTP::Message library fails to parse > some files due to a wrong pattern matching during header parsing. > > Example message (ignore first line before parsing): > http://julien23.free.fr/pub/patches/HTTP::Message_parse_error.http > > Example decoder: > http://julien23.free.fr/pub/patches/HTTP::Message_parse_error.pl > > Patch: > http://julien23.free.fr/pub/patches/patch_Perl_HTTP::Message_parse.diff

--- Message.pm.orig 2005-09-14 15:38:43.000000000 +0200 +++ Message.pm 2005-09-14 15:40:30.000000000 +0200 @@ -45,7 +45,7 @@ my @hdr; while (1) { - if ($str =~ s/^([^ \t:]+)[ \t]*: ?(.*)\n?//) { + if ($str =~ s/^([^ \t\n:]+)[ \t]*: ?(.*)\n?//) { push(@hdr, $1, $2); $hdr[-1] =~ s/\r\z//; }

Fri Dec 02 22:14:31 2005 FBRIERE [...] cpan.org - Correspondence added

From:

fbriere [...] fbriere.net

This bug drove me crazy, as I encountered it while testing one of my own modules, and ended up spending an hour trying to figure out what *I* had done wrong... Until the next release of libwww-perl, if you need to work around this, it might interest you to know that the current regex will not glob spaces, which are actually allowed in a MIME boundary (except at the end). (The exception being if the spaces are immediately followed by a colon, in which case the regex will match again.) So, if you are producing your own HTTP messages (like I did), just make sure your boundary has at least one space and no colon in it. If you are parsing arbitrary messages, your best bet would be to manually find the boundary, prepend something like " _" to it, and perform a global search/replace. This is an exercise left to the reader.

Tue Dec 06 06:14:30 2005 GAAS [...] cpan.org - Status changed from 'new' to 'resolved'