Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the WWW-Mechanize CPAN distribution.

Report information
The Basics
Id: 8673
Status: resolved
Priority: 0/
Queue: WWW-Mechanize

People
Owner: Nobody in particular
Requestors: dom [...] idealx.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Date: Tue, 30 Nov 2004 20:01:00 +0100
From: Dominique Quatravaux <dom [...] idealx.com>
To: bug-www-mechanize [...] rt.cpan.org
Subject: New features: input_has_label() and other HTML-tree methods
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Dear Mechanists, Enclosed is a quite jumbo'ed patch (1800+ lines!) to WWW::Mechanize that purports to add some of the "technical-free" features I've been raving about on the developper list. The "Changes" file (near the top of the patch) details the added methods, allow me to outline a few of them: ~ * input_has_label(): the most important one, allows the programmer ~ to match input widgets using their nearby text labels in the ~ HTML source in a wide variety of situations. Labels may be in ~ the same paragraph as the form controls, or they may be <h3> ~ titles, or they may be in the same HTML table line etc - look at ~ t/html-tree.t to see how robust the heuristics is. The following ~ code (excerpt from the POD) tells the Mech to click on whatever ~ button is labeled "I want no spam" in a technical-free fashion ~ (that is, no need to "view source..." in the browser or count ~ widgets anymore in order to implement that): ~ map { my $input = $_; ~ $input->value($mech->input_has_label($input, qr/I want no spam/i)) } ~ ($mech->forms->[0]->inputs); ~ * ->node_of_form() and ->nodes_of_input(): starting from the ~ HTML::Form::Input objects, one can get back at their position in ~ the HTML parse tree. Quite handy to get additional contextual ~ info about the widgets that the parser in HTML::Form might ~ neglect to remember; ~ * ->text_node_at(): likewise, starting from a given string offset ~ in the plain-text version of the current page, one can get back ~ at the corresponding HTML node. The converse (from tree to text) ~ is also possible using ->textify_tree(). The programmer can now ~ mix and match regex-based and tree-based methods to check the ~ contents of those HTML documents! Please let me know what you think of it all. Pursuant to Andy's recent advice, I have done my best to ensure that this patch is ready for immediate inclusion in a Mech release (docs, tests, no tabs :-) but of course I'm prepared to deal with modification requests and resubmit. Best regards, - -- Dominique QUATRAVAUX Ingénieur senior 01 44 42 00 08 IDEALX -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFBrMNsMJAKAU3mjcsRAsBiAJ0SMKFdMnqUnxUvgkTfXMiQuJ5/nQCdG4Mz zlZcOiArJ2FemblOR2dtpEQ= =qOeq -----END PGP SIGNATURE-----

Message body is not shown because it is too large.

Date: Wed, 1 Dec 2004 21:40:10 -0500
From: Mark Stosberg <mark [...] summersault.com>
To: "dom [...] idealx.com via RT" <bug-WWW-Mechanize [...] rt.cpan.org>
CC: Andy Lester <andy [...] petdance.com>
Subject: Re: [cpan #8673] New features: input_has_label() and other HTML-tree methods
RT-Send-Cc:
On Tue, Nov 30, 2004 at 01:59:53PM -0500, dom@idealx.com via RT wrote: Show quoted text
> > Enclosed is a quite jumbo'ed patch (1800+ lines!) to WWW::Mechanize > that purports to add some of the "technical-free" features I've been > raving about on the developper list. The "Changes" file (near the top > of the patch) details the added methods, allow me to outline a few of > them: > > ~ * input_has_label(): the most important one, allows the programmer > ~ to match input widgets using their nearby text labels in the > ~ HTML source in a wide variety of situations. Labels may be in > ~ the same paragraph as the form controls, or they may be <h3> > ~ titles, or they may be in the same HTML table line etc - look at > ~ t/html-tree.t to see how robust the heuristics is. The following > ~ code (excerpt from the POD) tells the Mech to click on whatever > ~ button is labeled "I want no spam" in a technical-free fashion > ~ (that is, no need to "view source..." in the browser or count > ~ widgets anymore in order to implement that): > > ~ map { my $input = $_; > ~ $input->value($mech->input_has_label($input, qr/I want no > spam/i)) } > ~ ($mech->forms->[0]->inputs);
Dominique, The input_has_label() function does look really cool. I want to play with it some more to try it out,, but I expect that I would like to see it in Mech. From what I can tell, the rest of the other functions mostly serve to support this one method, and not very interdependent on Mech. They mostly seem to deal with HTML::TreeBuilder and HTML::Form objects. Have you thought about splitting off this functionality into into it's own module? It seems like it would be interesting to other HTML parsing tools besides Mech. I visualize an input_has_label() function in Mech that is a thin wrapper around another module which does most of the work. This would have some advantages for you, I think: You can work on the code independently of Mech, with your own release schedule. People who want to use your logic for other purposes besides Mech will be inclined to contribute. The rest of Mech is mostly very high level wrappers around other things: LWP::UserAgent, HTML::Form and WWW::Mechanize::Link. The reason people prefer to use it over LWP directly is that it has a simpler, easy to use interface. So, besides that I think it has some benefits for you, I also think it may be more in line with the existing design of Mech to keep most of this code elsewhere. If people want to use your more advanced methods, they can always access the module directly to use them. What do you think? Andy? Mark -- http://mark.stosberg.com/
On Tue Nov 30 13:59:50 2004, dom@idealx.com wrote: Show quoted text
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Dear Mechanists, > > Enclosed is a quite jumbo'ed patch (1800+ lines!) to WWW::Mechanize > that purports to add some of the "technical-free" features I've been > raving about on the developper list. The "Changes" file (near the top > of the patch) details the added methods, allow me to outline a few of > them: > > ~ * input_has_label(): the most important one, allows the programmer > ~ to match input widgets using their nearby text labels in the > ~ HTML source in a wide variety of situations. Labels may be in > ~ the same paragraph as the form controls, or they may be <h3> > ~ titles, or they may be in the same HTML table line etc - look at > ~ t/html-tree.t to see how robust the heuristics is. The following > ~ code (excerpt from the POD) tells the Mech to click on whatever > ~ button is labeled "I want no spam" in a technical-free fashion > ~ (that is, no need to "view source..." in the browser or count > ~ widgets anymore in order to implement that): > > ~ map { my $input = $_; > ~ $input->value($mech->input_has_label($input, qr/I want no > spam/i)) } > ~ ($mech->forms->[0]->inputs); > > ~ * ->node_of_form() and ->nodes_of_input(): starting from the > ~ HTML::Form::Input objects, one can get back at their position in > ~ the HTML parse tree. Quite handy to get additional contextual > ~ info about the widgets that the parser in HTML::Form might > ~ neglect to remember; > ~ * ->text_node_at(): likewise, starting from a given string offset > ~ in the plain-text version of the current page, one can get back > ~ at the corresponding HTML node. The converse (from tree to text) > ~ is also possible using ->textify_tree(). The programmer can now > ~ mix and match regex-based and tree-based methods to check the > ~ contents of those HTML documents! > > Please let me know what you think of it all. Pursuant to Andy's recent > advice, I have done my best to ensure that this patch is ready for > immediate inclusion in a Mech release (docs, tests, no tabs :-) but of > course I'm prepared to deal with modification requests and resubmit.
Dominique, Please publish this work as a plugin. Let us know if you have an questoins about how to do that. A few Mech plugins have already been published as examples. Mark