Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 13049
Status: resolved
Priority: 0/
Queue: WWW-Mechanize

People
Owner: Nobody in particular
Requestors: gt0914a [...] mail.gatech.edu
Cc:
AdminCc: MARKSTOS [...] cpan.org

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: (no value)



Subject: Forms functions can find forms on certain webpages
I have been coding a script to access a stock website. I seek to use the main "quote" form in order to get quotes on certain stocks. I can see the <form> HTML tags in the web page source, yet Mechanize cannote seem to find any forms on the page. The code follows: use strict; use WWW::Mechanize; use HTML::TokeParser; my @symbols = ("QQQQ","MDY","IBM"); # just some test cases my $agent = WWW::Mechanize->new(); my $r = $agent->get("http://finance.yahoo.com", ":content_file" => "mainpage.html" ); print $r->is_error; print $agent->forms; $agent->form_name("quote"); $agent->field('s' => $symbols[0]); $agent->submit(); ////// print $agent->forms prints nothing, the interpreter claims a form called "quote" does not exist, and therefore the call to field fails. Any suggestions?
Date: Wed, 1 Jun 2005 09:28:52 -0500
From: Mark Stosberg <mark [...] summersault.com>
To: Guest via RT <bug-WWW-Mechanize [...] rt.cpan.org>
Subject: Re: [cpan #13049] Forms functions can find forms on certain webpages
RT-Send-Cc:
On Tue, May 31, 2005 at 11:44:11PM -0400, Guest via RT wrote: Show quoted text
> > This message about WWW-Mechanize was sent to you by guest <> via rt.cpan.org > > Full context and any attached attachments can be found at: > <URL: https://rt.cpan.org/Ticket/Display.html?id=13049 > > > I have been coding a script to access a stock website. I seek to use the main > "quote" form in order to get quotes on certain stocks. > I can see the <form> HTML tags in the web page source, yet > Mechanize cannote seem to find any forms on the page. The code follows: > > use strict; > use WWW::Mechanize; > use HTML::TokeParser; > > my @symbols = ("QQQQ","MDY","IBM"); # just some test cases > > my $agent = WWW::Mechanize->new(); > my $r = $agent->get("http://finance.yahoo.com", ":content_file" => "mainpage.html" ); > > print $r->is_error; > > print $agent->forms; > > $agent->form_name("quote"); > > $agent->field('s' => $symbols[0]); > $agent->submit(); > > ////// > print $agent->forms prints nothing, the interpreter claims a form called "quote" does not exist, and therefore the call to field fails. > > Any suggestions?
Did you try without the ":content_file" option? It looks like it may be altering where the page content is stored in a way that might confuse Mech. The documentation says: "If a $filename is provided with the :content_file option, then the response content will be saved here instead of in the response object." On the page with the form on it, did you inspect to $agent->content to see if the source code looks like what you expect? Did you try running the page through an HTML validator to see it looks reasonably valid? Mark
[mark@summersault.com - Wed Jun 1 10:28:53 2005]: Show quoted text
> On Tue, May 31, 2005 at 11:44:11PM -0400, Guest via RT wrote:
> > > > This message about WWW-Mechanize was sent to you by guest <> via
> rt.cpan.org
> > > > Full context and any attached attachments can be found at: > > <URL: https://rt.cpan.org/Ticket/Display.html?id=13049 > > > > > I have been coding a script to access a stock website. I seek to use
> the main
> > "quote" form in order to get quotes on certain stocks. > > I can see the <form> HTML tags in the web page source, yet > > Mechanize cannote seem to find any forms on the page. The code
> follows:
> > > > use strict; > > use WWW::Mechanize; > > use HTML::TokeParser; > > > > my @symbols = ("QQQQ","MDY","IBM"); # just some test cases > > > > my $agent = WWW::Mechanize->new(); > > my $r = $agent->get("http://finance.yahoo.com", ":content_file" =>
> "mainpage.html" );
> > > > print $r->is_error; > > > > print $agent->forms; > > > > $agent->form_name("quote"); > > > > $agent->field('s' => $symbols[0]); > > $agent->submit(); > > > > ////// > > print $agent->forms prints nothing, the interpreter claims a form
> called "quote" does not exist, and therefore the call to field fails.
> > > > Any suggestions?
> > Did you try without the ":content_file" option? It looks like it may > be altering where the page content is stored > in a way that might confuse Mech. The documentation says: > > "If a $filename is provided with the :content_file option, then the > response content will be saved here instead of in the response > object." > > On the page with the form on it, did you inspect to $agent->content to > see if > the source code looks like what you expect? > > Did you try running the page through an HTML validator to see it looks > reasonably valid? > > Mark
You are correct. Storing the returned web page is the cause of this issue. Thanks for pointing this out.
Date: Fri, 3 Jun 2005 10:15:36 -0500
From: Mark Stosberg <mark [...] summersault.com>
To: Guest via RT <bug-WWW-Mechanize [...] rt.cpan.org>
Subject: Re: [cpan #13049] Forms functions can find forms on certain webpages
RT-Send-Cc:
On Wed, Jun 01, 2005 at 11:41:56PM -0400, Guest via RT wrote: Show quoted text
> > > my $r = $agent->get("http://finance.yahoo.com", ":content_file" => "mainpage.html" );
> > > > Did you try without the ":content_file" option? It looks like it may > > be altering where the page content is stored > > in a way that might confuse Mech. The documentation says: > > > > "If a $filename is provided with the :content_file option, then the > > response content will be saved here instead of in the response > > object."
> > > You are correct. Storing the returned web page is the cause of this > issue. Thanks for pointing this out.
Andy, Do you think we should do anything to address the ":content_file" issue? We add a disclaimer that some features are incompatible with it, or see what's involved in making features work anyway by accessing the data in the right place. It would be nice to at least fail with a clueful error message about why the content is missing. Mark
Date: Fri, 3 Jun 2005 10:24:47 -0500
From: Andy Lester <andy [...] petdance.com>
To: "mark [...] summersault.com via RT" <bug-WWW-Mechanize [...] rt.cpan.org>
Subject: Re: [cpan #13049] Forms functions can find forms on certain webpages
RT-Send-Cc:
On Fri, Jun 03, 2005 at 11:15:37AM -0400, mark@summersault.com via RT (bug-WWW-Mechanize@rt.cpan.org) wrote: Show quoted text
> Do you think we should do anything to address the ":content_file" issue? > We add a disclaimer that some features are incompatible with it, or see > what's involved in making features work anyway by accessing the data in > the right place.
Yeah, let's put something in the docs that explain that it bypasses normal Mechness. Besides, they can save_content() anyway. -- Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
Date: Fri, 3 Jun 2005 10:44:15 -0500
From: Mark Stosberg <mark [...] summersault.com>
To: Andy Lester via RT <bug-WWW-Mechanize [...] rt.cpan.org>
Subject: Re: [cpan #13049] :content_file needs documentation
RT-Send-Cc:
On Fri, Jun 03, 2005 at 11:31:41AM -0400, Andy Lester via RT wrote: Show quoted text
> > This message about WWW-Mechanize was sent to you by PETDANCE <andy@petdance.com> via rt.cpan.org > > Full context and any attached attachments can be found at: > <URL: https://rt.cpan.org/Ticket/Display.html?id=13049 > > > On Fri, Jun 03, 2005 at 11:15:37AM -0400, mark@summersault.com via RT (bug-WWW-Mechanize@rt.cpan.org) wrote:
> > Do you think we should do anything to address the ":content_file" issue? > > We add a disclaimer that some features are incompatible with it, or see > > what's involved in making features work anyway by accessing the data in > > the right place.
> > Yeah, let's put something in the docs that explain that it bypasses > normal Mechness. Besides, they can save_content() anyway.
Here's a start: --- Mechanize.pm.orig 2005-06-03 10:41:38.854410528 -0500 +++ Mechanize.pm 2005-06-03 10:43:14.677843152 -0500 @@ -310,6 +310,10 @@ and you can rest assured that the parms will get filtered down appropriately. +B<NOTE:> Because C<:content_file> causes the page contents to be stored in a file +instead of the response object, some Mech functions that expect it to be there +won't work as expected. Use with caution. + =cut sub get {
Added the doc patch. Thanks.