Skip Menu |

This queue is for tickets about the libwww-perl CPAN distribution.

Report information
The Basics
Id: 13025
Status: resolved
Priority: 0/
Queue: libwww-perl

People
Owner: Nobody in particular
Requestors: bhirt+cpan [...] mobygames.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: (no value)



Subject: parsing bug in HTTP::Message::parse()
Hi! I've stumbled across a bug in multi-part messages in HTTP::Message. If the message content contains something that looks like a header, it accidentally gets detected as a header. However, a blank line after a header should signal the end of headers, and below it, the start of content as per the spec. It just looks like your regex needs some tweeking, or maybe do something like this: sub parse { my($class, $str) = @_; my ($headPart,$contentPart) = split(/\r?\n\r?\n/,$str,2); my @hdr; while ($headPart =~ s/^([^ \t:]+)[ \t]*: ?(.*)\n?//) { push(@hdr, $1, $2); } new($class, \@hdr, $contentPart); } I've included a sample program to reproduce the problem so you can see better what I'm trying to say. If you look at the Dumper() output for the part "3fc90c2f106c4dc7e7742d6bb12b68f7" you'll see that it got detected incorrectly. Let me know if there is anything i can do to help.
#!/usr/bin/perl -w use strict; use HTTP::Message; use HTTP::Headers; use Data::Dumper; my $content = '------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="fp720eb62d5af0f1d87412b385533c196c" 1 ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="trid720eb62d5af0f1d87412b385533c196c" 779758 ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="673bb732f4790c8cc3a63e5489bdb25b"; filename="" ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="3fc90c2f106c4dc7e7742d6bb12b68f7" some data ------------0xKhTmLbOuNdArY Content-Disposition: form-data; name="42a6fca55664038ee398f7ceb14e4835" aoeu:aoeu ------------0xKhTmLbOuNdArY-- '; my $headers = HTTP::Headers->new; $headers->header( 'content-type' => 'multipart/form-data; boundary=----------0xKhTmLbOuNdArY' ); my $m = HTTP::Message->new($headers,$content); foreach my $part ($m->parts) { print STDERR Dumper($part); }
[guest - Mon May 30 19:20:21 2005]: Show quoted text
> Hi! > > I've stumbled across a bug in multi-part messages in HTTP::Message. > If the message content contains something that looks like a header, > it accidentally gets detected as a header. However, a blank line > after a header should signal the end of headers, and below it, the > start of content as per the spec. It just looks like your regex > needs some tweeking, or maybe do something like this:
Found the same problem today... :( #!/usr/bin/perl use strict; use warnings; use HTTP::Message; use Data::Dumper; my $bad = qq(Content-Disposition: form-data; name="fetch_url"\n) . qq(\nhttp://www.jbisbee.com/\n); my $part = HTTP::Message->parse($bad); warn Dumper($part); With the output... $VAR1 = bless( { '_content' => '', '_headers' => bless( { 'content-disposition' => 'form-data; name="fetch_url"', ' ' http' => '//www.jbisbee.com/' }, 'HTTP::Headers' ) }, 'HTTP::Message' ); of course this only happens with forms that specify enctype="multipart/form-data" for uploads so it may be a bit hard to reproduce
From: Julien Gaulmin <julien23 [...] free.fr>
I don't know if we have _exactly_ the same problem but I did have an error while trying to parse some gzip encoded HTTP message. I made a patch that works fine for me but posted it to the wrong list. Here was the original report: Show quoted text
> The parse() function of the HTTP::Message library fails to parse > some files due to a wrong pattern matching during header parsing. > > Example message (ignore first line before parsing): > http://julien23.free.fr/pub/patches/HTTP::Message_parse_error.http > > Example decoder: > http://julien23.free.fr/pub/patches/HTTP::Message_parse_error.pl > > Patch: > http://julien23.free.fr/pub/patches/patch_Perl_HTTP::Message_parse.diff
--- Message.pm.orig 2005-09-14 15:38:43.000000000 +0200 +++ Message.pm 2005-09-14 15:40:30.000000000 +0200 @@ -45,7 +45,7 @@ my @hdr; while (1) { - if ($str =~ s/^([^ \t:]+)[ \t]*: ?(.*)\n?//) { + if ($str =~ s/^([^ \t\n:]+)[ \t]*: ?(.*)\n?//) { push(@hdr, $1, $2); $hdr[-1] =~ s/\r\z//; }
From: fbriere [...] fbriere.net
This bug drove me crazy, as I encountered it while testing one of my own modules, and ended up spending an hour trying to figure out what *I* had done wrong... Until the next release of libwww-perl, if you need to work around this, it might interest you to know that the current regex will not glob spaces, which are actually allowed in a MIME boundary (except at the end). (The exception being if the spaces are immediately followed by a colon, in which case the regex will match again.) So, if you are producing your own HTTP messages (like I did), just make sure your boundary has at least one space and no colon in it. If you are parsing arbitrary messages, your best bet would be to manually find the boundary, prepend something like " _" to it, and perform a global search/replace. This is an exercise left to the reader.