Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 7843
Status: resolved
Priority: 0/
Queue: WWW-Mechanize

People
Owner: MARKSTOS [...] cpan.org
Requestors: adelton [...] fi.muni.cz
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Date: Fri, 1 Oct 2004 18:37:03 +0200
From: Honza Pazdziora <adelton [...] fi.muni.cz>
To: bug-WWW-Mechanize [...] rt.cpan.org
CC: libwww [...] perl.org
Subject: WWW::Mechanize security: uses default value of file inputs
Hello, I started using WWW::Mechanize to retrieve some data from server which is not under my control. Suddenly, error Can't open file $sou: No such file or directory at /usr/lib/perl5/site_perl/5.8.1/HTML/Form.pm line 525 appeared. Whoa. I never told it in my script to work with $sou? Or '$sou'. Where did it come from? After digging through the sources, I figured out that <input type="file" name="soubor" value="$sou" vsize="60"> really is in the source HTML and HTML::Form passes it to WWW::Mechanize. And since I never intented to touch this value, the default is used and WWW::Mechanize happily tries to upload the file. I consider this to be a major security hole -- what if the authors of the HTML page used path to user's id_dsa file? The HTML 4 spec says in section 17.4.1 file Creates a file select control. User agents may use the value of the value attribute as the initial file name. Since user agents _may_ use this value but are not required to, and since with WWW::Mechanize there is no interactive control of the values used, I believe it is crutial that the default value is not used for the file inputs. Even the visual user agents (Mozilla, etc.) clear the file inputs, since it is too easy to script automatic upload with JS. Please consider the following patch: --- WWW/Mechanize.pm.orig 2004-10-01 18:06:39.000000000 +0200 +++ WWW/Mechanize.pm 2004-10-01 18:09:26.000000000 +0200 @@ -1387,6 +1387,15 @@ sub _parse_html { my $self = shift; $self->{forms} = [ HTML::Form->parse($self->content, $self->base) ]; + if (@{ $self->{forms} }) { + for my $form (@{ $self->{forms} }) { + for my $input ($form->inputs) { + if ($input->type eq 'file') { + $input->value( undef ); + } + } + } + } $self->{form} = $self->{forms}->[0]; $self->_extract_links(); } which removes all file values from all forms, preventing accidental upload of files. Alternatively, HTML::Form could do this for us. But there may be some situations when you might want to see the value that came in the HTML, so I could not argue strongly for removal of this value already in HTML::Form. But it could be done on line 128 of HTML::Form 1.44. I'm Cc'ing libwww@perl.org to gather views -- maybe for file inputs a separate method like default_value could be used to get the original value, while value would return undef. Or maybe there are situations when WWW::Mechanize could retain the default value. Then the patch suggested could be wrapped with a check for some option that would keep the file input values intact. Nonetheless, the default behaviour should be to remove the values. I'd appreciate your comment about this issue, -- ------------------------------------------------------------------------ Honza Pazdziora | adelton@fi.muni.cz | http://www.fi.muni.cz/~adelton/ .project: Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ... Only self-confident people can be simple.
[adelton@fi.muni.cz - Fri Oct 1 13:31:15 2004]: Show quoted text
> > > Since user agents _may_ use this value but are not required to, > and since with WWW::Mechanize there is no interactive control of > the values used, I believe it is crutial that the default value is > not used for the file inputs. Even the visual user agents (Mozilla, > etc.) clear the file inputs, since it is too easy to script automatic > upload with JS.
You are write about this. Some UA's don't allow you to set a default value. Show quoted text
> Please consider the following patch: > > --- WWW/Mechanize.pm.orig 2004-10-01 18:06:39.000000000 +0200 > +++ WWW/Mechanize.pm 2004-10-01 18:09:26.000000000 +0200 > @@ -1387,6 +1387,15 @@ > sub _parse_html { > my $self = shift; > $self->{forms} = [ HTML::Form->parse($self->content, $self->base) > ]; > + if (@{ $self->{forms} }) { > + for my $form (@{ $self->{forms} }) { > + for my $input ($form->inputs) { > + if ($input->type eq 'file') { > + $input->value( undef ); > + } > + } > + } > + } > $self->{form} = $self->{forms}->[0]; > $self->_extract_links(); > } > > which removes all file values from all forms [when the HTML is loaded, not > submitted], preventing accidental upload of files.
I'm fine with this patch. Would you also be willing to write a Test::More test script, which confirms that this works? There are examples in "t/" directory of the distribution. You could also patch the docs to mention that we do this, and patch the Changes file to give yourself credit for all your work. :) As you can see from the RT bug queue, the bug reports for Mech are piling up, and we need all the help we can get. Thanks! Mark
From: adelton [...] fi.muni.cz
Show quoted text
> > I'm fine with this patch. Would you also be willing to write a > Test::More test > script, which confirms that this works? There are examples in "t/" > directory of > the distribution.
I'm attaching a patch against WWW::Mechanize 1.04. I was sbout to test the result of the upload in the log-server and check that no file was uploaded, but it seems to do CGI->new($r->uri->query) so it's not able to process POST requests. Therefore I only test the value of the file upload field and the created request in (a new test file) t/upload.t and the fact that the value of the upload field is empty string in t/local/tick.t. The t/upload.t might need to take possibly different boundaries into account but code in HTTP::Request::Common seems to suggest that the value of xYzZY is sufficiently stable. Show quoted text
> You could also patch the docs to mention that we do this, and patch > the > Changes > file to give yourself credit for all your work. :) As you can see from
Included in the patch. Yours, Jan Pazdziora
diff -Nru WWW-Mechanize-1.04-orig/Changes WWW-Mechanize-1.04/Changes --- WWW-Mechanize-1.04-orig/Changes 2004-09-16 06:28:03.000000000 +0200 +++ WWW-Mechanize-1.04/Changes 2004-10-18 15:37:51.000000000 +0200 @@ -1,6 +1,10 @@ # $Id: Changes,v 1.137 2004/09/16 04:20:07 petdance Exp $ Revision history for Perl extension WWW::Mechanize + [FIXES] + * Upload <input type="file" ... > does not use the default + value to prevent attacks, patch by Jan Pazdziora (RT ##7843). + 1.04 Wed Sep 15 23:27:53 CDT 2004 [ENHANCEMENTS] diff -Nru WWW-Mechanize-1.04-orig/lib/WWW/Mechanize.pm WWW-Mechanize-1.04/lib/WWW/Mechanize.pm --- WWW-Mechanize-1.04-orig/lib/WWW/Mechanize.pm 2004-09-16 06:23:35.000000000 +0200 +++ WWW-Mechanize-1.04/lib/WWW/Mechanize.pm 2004-10-18 15:51:46.000000000 +0200 @@ -534,6 +534,10 @@ The option I<$number> parameter is used to distinguish between two fields with the same name. The fields are numbered from 1. +If the field is of type file (file upload field), the value is always +cleared to prevent remote sites from downloading your local files. +To upload a file, specify its file name explicitly. + =cut sub value { @@ -1387,6 +1391,15 @@ sub _parse_html { my $self = shift; $self->{forms} = [ HTML::Form->parse($self->content, $self->base) ]; + if (@{ $self->{forms} }) { + for my $form (@{ $self->{forms} }) { + for my $input ($form->inputs) { + if ($input->type eq 'file') { + $input->value( undef ); + } + } + } + } $self->{form} = $self->{forms}->[0]; $self->_extract_links(); } diff -Nru WWW-Mechanize-1.04-orig/MANIFEST WWW-Mechanize-1.04/MANIFEST --- WWW-Mechanize-1.04-orig/MANIFEST 2004-09-16 04:29:46.000000000 +0200 +++ WWW-Mechanize-1.04/MANIFEST 2004-10-18 15:42:40.000000000 +0200 @@ -42,6 +42,8 @@ t/taint.t t/tick.html t/tick.t +t/upload.html +t/upload.t t/warnings.t t/warn.t diff -Nru WWW-Mechanize-1.04-orig/t/local/log-server WWW-Mechanize-1.04/t/local/log-server --- WWW-Mechanize-1.04-orig/t/local/log-server 2004-05-02 05:03:13.000000000 +0200 +++ WWW-Mechanize-1.04/t/local/log-server 2004-10-18 15:35:18.000000000 +0200 @@ -100,6 +100,7 @@ <input type="checkbox" name="cat" value="cat_foo" %s /> <input type="checkbox" name="cat" value="cat_bar" %s /> <input type="checkbox" name="cat" value="cat_baz" %s /> + <input type="file" name="upload" value="README" /> </form> </p> </body> diff -Nru WWW-Mechanize-1.04-orig/t/local/submit.t WWW-Mechanize-1.04/t/local/submit.t --- WWW-Mechanize-1.04-orig/t/local/submit.t 2004-05-02 05:03:13.000000000 +0200 +++ WWW-Mechanize-1.04/t/local/submit.t 2004-10-18 15:36:10.000000000 +0200 @@ -1,7 +1,7 @@ use warnings; use strict; use lib 't/local'; -use Test::More tests => 13; +use Test::More tests => 15; use LocalServer; BEGIN { delete @ENV{ qw( http_proxy HTTP_PROXY ) }; } @@ -22,6 +22,8 @@ ok( $response->is_success, 'Got local page' ) or die "Can't even fetch local page"; ok( $mech->is_html ); +is( $mech->value('upload'), '', "Hopefully no upload happens"); + $mech->field(query => "foo"); # Filled the "q" field $response = $mech->submit; @@ -31,6 +33,8 @@ like($mech->content, qr/\bfoo\b/i, "Found 'Foo'"); +is( $mech->value('upload'), '', "Hopefully no upload happens"); + SKIP: { eval "use Test::Memory::Cycle"; skip "Test::Memory::Cycle not installed", 1 if $@; diff -Nru WWW-Mechanize-1.04-orig/t/upload.html WWW-Mechanize-1.04/t/upload.html --- WWW-Mechanize-1.04-orig/t/upload.html 1970-01-01 01:00:00.000000000 +0100 +++ WWW-Mechanize-1.04/t/upload.html 2004-10-18 15:43:27.000000000 +0200 @@ -0,0 +1,10 @@ +<html> +<body> + +<form action="http://localhost/" method="post" enctype="multipart/form-data"> + +<input type="file" name="upload" value="MANIFEST" /> +<input type="Submit" name="submit" value="Submit" label="Sumbit" /> +</form> +</body> + diff -Nru WWW-Mechanize-1.04-orig/t/upload.t WWW-Mechanize-1.04/t/upload.t --- WWW-Mechanize-1.04-orig/t/upload.t 1970-01-01 01:00:00.000000000 +0100 +++ WWW-Mechanize-1.04/t/upload.t 2004-10-18 16:05:00.000000000 +0200 @@ -0,0 +1,38 @@ +#!perl -T + +use strict; +use Test::More tests => 6; +use URI::file; + +use_ok( 'WWW::Mechanize' ); + +my $mech = WWW::Mechanize->new( cookie_jar => undef ); +isa_ok( $mech, "WWW::Mechanize" ); + +my $uri = URI::file->new_abs( "t/upload.html" )->as_string; +$mech->get( $uri ); +is( ref $mech->uri, "", "URI shouldn't be an object" ); +ok( $mech->success, $uri ); + +my $form = $mech->form_number(1); +my $reqstring = $form->click->as_string; +$reqstring =~ s/\r//g; + +my $wanted = <<'EOT'; +POST http://localhost/ +Content-Length: 77 +Content-Type: multipart/form-data; boundary=xYzZY + +--xYzZY +Content-Disposition: form-data; name="submit" + +Submit +--xYzZY-- +EOT + +is( $reqstring, $wanted, "Proper posting" ); + +$mech->field('upload', 'MANIFEST'); +$reqstring = $form->click->as_string; +like( $reqstring, qr/Cookbook/, 'The uploaded file should be in the request'); +
[guest - Mon Oct 18 10:08:50 2004]: Show quoted text
> > > > I'm fine with this patch. Would you also be willing to write a > > Test::More test > > script, which confirms that this works? There are examples in "t/" > > directory of > > the distribution.
> > I'm attaching a patch against WWW::Mechanize 1.04.
This has applied, committed to CVS, and will appear in the next release of Mech. Thanks for the high quaity patch Jan. Could you tempt to stick around and work on some of the other issues in the Mech RT issue tracking system. :) I could ask Andy about arranging CVS 'commit' access for you if you were interested in contributing some on an ongoing basis.