Skip Menu |

This queue is for tickets about the WWW-Mechanize-Plugin-JavaScript CPAN distribution.

Report information
The Basics
Id: 43582
Status: resolved
Priority: 0/
Queue: WWW-Mechanize-Plugin-JavaScript

People
Owner: Nobody in particular
Requestors: radimre [...] freemail.hu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: several mechanize + javascript bugs
Date: Tue, 24 Feb 2009 17:55:49 +0100
To: bug-WWW-Mechanize-Plugin-JavaScript [...] rt.cpan.org
From: Imre Rad <radimre [...] freemail.hu>
Hi, thanks for the latest bugfix, here are the next ones: teszt2.pl: #!/usr/bin/perl use warnings; use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); $mech->use_plugin('JavaScript'); $mech->get("http://localhost/je/teszt.htm"); # perl teszt2.pl SyntaxError: Expected statement or function declaration but found '<!-- wind' at http://localhost/je/teszt.js, line 1. the failing web page is the following: teszt.htm: <html> <head> <script type="text/javascript" src="teszt.js"></script> </head> <body> </body> </html> teszt.js: <!-- window.alert("hello wrodl"); //--> i know this syntax is probably incorrect as comments are used in the external file not in the embedded html code within the script tags, but i ran into this bug while developing a script for a real website and also decent browsers can deal with this syntax. teszt3.pl: #!/usr/bin/perl use warnings; use strict; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); $mech->use_plugin('JavaScript'); $mech->get("http://localhost/je/teszt2.htm"); # perl teszt3.pl TypeError: undefined has no properties, not even one named length at http://localhost/je/teszt2.htm, line 7. teszt2.htm is basicly the standard dreamweaver generated mm_preload function: <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=windows-1250"> <script language="JavaScript" type="text/JavaScript"> <!-- function MM_preloadImages() { //v3.0 var d=document; if(d.images){ if(!d.MM_p) d.MM_p=new Array(); var i,j=d.MM_p.length,a=MM_preloadImages.arguments; for(i=0; i<a.length; i++) if (a[i].indexOf("#")!=0){ d.MM_p[j]=new Image; d.MM_p[j++].src=a[i];}} } //--> </script> </head> <body onLoad="MM_preloadImages('pic1.gif','pic2.gif')" > </body> </html> 3rd: <script> <!-- window.alert("foobar"); --> </script> SyntaxError: Expected expression but found '> ' at http://localhost/je/teszt3.htm, line 4. if i close the comment block with //--> in line 4, error disappears. I am using v0.08. I can send you several more bug reports if yuo want :) Are you going to add frames support one day? It would kick ass. best regards imre rad
On Tue Feb 24 12:27:15 2009, radimre@freemail.hu wrote: Show quoted text
> Hi, > > thanks for the latest bugfix, here are the next ones: > > ... > > teszt.js: > <!-- > window.alert("hello wrodl"); > //--> > > i know this syntax is probably incorrect as comments are used in the > external file not in the embedded html code within the script tags, > but i ran into this bug while developing a script for a real website > and also decent browsers can deal with this syntax. >
I can update it to strip HTML comments from external scripts as well as internal easily enough. Show quoted text
> > ... > teszt2.htm is basicly the standard dreamweaver generated mm_preload > function: > <html> > <head> > <meta http-equiv="Content-Type" content="text/html; charset=windows- > 1250"> > <script language="JavaScript" type="text/JavaScript"> > <!-- > function MM_preloadImages() { //v3.0 > var d=document; if(d.images){ if(!d.MM_p) d.MM_p=new Array(); > var i,j=d.MM_p.length,a=MM_preloadImages.arguments; for(i=0;
That MM_preloadImages.arguments is really weird (and ECMAScript makes no mention of such usage). ‘arguments’ is usually a function-scoped variable (i.e., *inside* the function). Accessing it as a property of the function itself doesn’t make much sense, since it is not clear what would happen in the case of a recursive function. Anyway, I thought the oddities and inconsistencies of JavaScript 1.0 were gone by now. :-) Supporting this will be a little difficult. I will have to think about it. Show quoted text
> i<a.length; i++) > if (a[i].indexOf("#")!=0){ d.MM_p[j]=new Image; > d.MM_p[j++].src=a[i];}} > } > //--> > </script> > </head> > <body onLoad="MM_preloadImages('pic1.gif','pic2.gif')" > > </body> > </html> > > > 3rd: > <script> > <!-- > window.alert("foobar"); > --> > </script> > > SyntaxError: Expected expression but found '> > ' at http://localhost/je/teszt3.htm, line 4. > > if i close the comment block with //--> in line 4, error disappears.
The only time I ever tried --> Netscape couldn’t handle it. That was 10 years ago. Ever since then, I’ve *always* used //-->, and I assumed everyone else did the same. It seems that I’d better bring myself up to date. :-) Show quoted text
> > > > I am using v0.08. I can send you several more bug reports if yuo want > :)
Please do. Show quoted text
> Are you going to add frames support one day? It would kick ass.
Hmm, I thought I had already done so in 0.006. There are probably bugs stopping it from working for you. Actually, I’ve just realized I forgot to document the frame features (look at the new docs for WWW::Mechanize::Plugin::DOM::Window, and also HTML::DOM::Element::Frame). (You can get the window object with $mech->plugin('DOM')- Show quoted text
>window.)
I have released a new version with the two HTML comment-related bugs fixed (CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-0.009.tar.gz).
Subject: Re: [rt.cpan.org #43582] several mechanize + javascript bugs
Date: Sat, 28 Feb 2009 17:56:22 +0100
To: Father Chrysostomos via RT <bug-WWW-Mechanize-Plugin-JavaScript [...] rt.cpan.org>
From: Imre Rad <radimre [...] freemail.hu>
hi, thanks for the recent bugfixes. Show quoted text
> Supporting this will be a little difficult. I will have to think about it.
in my opinion, if you really want this extension to be working, you need to make it deal with these commonly used popular scripts. Show quoted text
> Hmm, I thought I had already done so in 0.006. There are probably bugs stopping it from
Indeed, my bad that i didnt notice it. Its design is nice, but it is buggy though, here is a simple test case: index.html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> <HTML> <HEAD> <TITLE>A simple frameset document</TITLE> </HEAD> <FRAMESET rows="100, 200"> <FRAME name="frame1" src="frame1.html"> <FRAME name="frame2" src="frame2.html"> </FRAMESET> </HTML> frame1.html: <HTML><HEAD></HEAD><BODY> <form name="testform" method="get" action="test.htm"> <input type="text" name="changeme" value="defaultvalue"> </form> </body> </html> frame2.html: <HTML><HEAD></HEAD><BODY> <a href="javascript:parent.frames[0].document.testform.changeme.value='lofasz';parent.frames[0].document.testform.submit();">click here to submit another frame's form</a> </body> </html> teszt5.pl: #!/usr/bin/perl use warnings; use strict; use Data::Dumper; use WWW::Mechanize; my $mech = WWW::Mechanize->new(); $mech->use_plugin('JavaScript'); $mech->get("http://localhost/je/frames/"); my $window = $mech->plugin('DOM')->window->[1]; if(!$window) { print STDERR "We dont have the 2nd frame!\n"; } my $mech2 = $window->mech; my $button = $mech2->find_link( text_regex => qr/click here/i ); if(!$button) { print STDERR "cant find the button!\n"; exit; } #ok we got it, follow $mech2->follow_link( text_regex => qr/click here/i ); and the error is: # perl teszt5.pl ReferenceError: The variable parent has not been declared at line 1. Another issue, it seems its impossible to click a standalone input button: <input type="button" name="testbutton" value="submit form in another frame" onClick="parent.frames[0].document.testform.changeme.value='lofasz';parent.frames[0].document.testform.submit();"> find_link/find_all_links/find_all_inputs/current_form()->find_input nothing can find it. As you can see it doesnt even have an id, and in some real life situations, it doesnt even have a name. Am i supposed to do something with evaled javacsript like?: $mech->plugin('JavaScript')->eval("document.getElementByTagName('input')[0].click();"); best regards Imre Rad
On Sat Feb 28 12:28:31 2009, radimre@freemail.hu wrote: Show quoted text
> hi, thanks for the recent bugfixes. >
> > Supporting this will be a little difficult. I will have to think
> about it. > > in my opinion, if you really want this extension to be working, you > need to make it deal with these commonly used popular scripts.
I’ve released a new version of JE (CPAN/authors/id/S/SP/SPROUT/JE-0.031.tar.gz) that does just this. Show quoted text
>
> > Hmm, I thought I had already done so in 0.006. There are probably
> bugs stopping it from > > Indeed, my bad that i didnt notice it. Its design is nice, but it is > buggy though, here is a simple test case: > ... > ReferenceError: The variable parent has not been declared at line 1.
Well, I seem to have forgotten the ‘parent’ variable—obviously an important feature :-). I’ll try to get to it within the next few days. Show quoted text
> > > Another issue, it seems its impossible to click a standalone input > button:
I presume you mean one that is not inside a form. Show quoted text
> <input type="button" name="testbutton" value="submit form in another > frame" >
onClick="parent.frames[0].document.testform.changeme.value='lofasz';parent.frames[0].doc ument.testform.submit();"> Show quoted text
> find_link/find_all_links/find_all_inputs/current_form()->find_input > nothing can find it. As you can see it doesnt even have an id, and in > some real life situations, it doesnt even have a name.
This I believe is a Mech limitation. I’ll have to look at it and find a workaround. Show quoted text
> Am i supposed to do something with evaled javacsript like?: > $mech->plugin('JavaScript')-
> >eval("document.getElementByTagName('input')[0].click();");
That will work (as long as you add an ‘s’ to ‘Element’), as will: $mech->plugin("DOM")->tree->find('input')->click; The latter should be faster.
On Tue Mar 03 15:06:15 2009, SPROUT wrote: Show quoted text
> On Sat Feb 28 12:28:31 2009, radimre@freemail.hu wrote:
> > Indeed, my bad that i didnt notice it. Its design is nice, but it is > > buggy though, here is a simple test case: > > ... > > ReferenceError: The variable parent has not been declared at line 1.
> > Well, I seem to have forgotten the ‘parent’ variable—obviously an > important feature :-). I’ll > try to get to it within the next few days.
CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-0.009a.tar.gz Show quoted text
>
> > > > > > Another issue, it seems its impossible to click a standalone input > > button:
> > I presume you mean one that is not inside a form. >
> > <input type="button" name="testbutton" value="submit form in another > > frame" > >
>
onClick="parent.frames[0].document.testform.changeme.value='lofasz';parent.frames[0].doc Show quoted text
> ument.testform.submit();">
> > find_link/find_all_links/find_all_inputs/current_form()->find_input > > nothing can find it. As you can see it doesnt even have an id, and
> in
> > some real life situations, it doesnt even have a name.
> > This I believe is a Mech limitation. I’ll have to look at it and find > a workaround. >
The problem is that find_all_inputs looks in the currently selected form, and is documented that way. I’ve been thinking of writing a subclass of Mech, called WWW::Scripter, so that I’ll have more freedom in designing the interface (among other reasons). So we could have direct access to the DOM, like this: $s = new WWW::Scripter; ... $s->document->find('input')->click(); (‘find’ is a method inherited from HTML::Element, which is far more convenient that the W3C DOM equivalent.) That still doesn’t directly address the issue you brought up. Do you have any ideas as to how the interface should work (for accessing buttons)?
Subject: Re: [rt.cpan.org #43582] several mechanize + javascript bugs
Date: Sat, 7 Mar 2009 11:38:53 +0100
To: Father Chrysostomos via RT <bug-WWW-Mechanize-Plugin-JavaScript [...] rt.cpan.org>
From: Imre Rad <radimre [...] freemail.hu>
Hi, Show quoted text
> CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-0.009a.tar.gz
I upgraded to 0.009a (and also JE to 0.31) but the latest test about the parent variable is still failing: # perl teszt5.pl ReferenceError: The variable parent has not been declared at line 1. Also, the Dreamweavers image swapper function is still failing: # perl teszt3.pl ReferenceError: The variable Image has not been declared at http://localhost/je/ teszt2.htm, line 9. Are you going to add support for Image objects to JE? Show quoted text
> $s = new WWW::Scripter; > ...
$s->>document->find('input')->click(); Show quoted text
> (‘find’ is a method inherited from HTML::Element, which is far more convenient that the W3C > DOM equivalent.) > That still doesn’t directly address the issue you brought up. Do you have any ideas as to how > the interface should work (for accessing buttons)?
Most of the websites are using standard forms only, which are completly supported by WWW::Mechanize. There are some tricky ones, like having an input button without form, or a simple pushme image with on onClick event, and I think finding them using the DOM tree is fine. I guess other users might ask you in the future adding support for fireing onmouseover/onmouseout events. best regards, Imre Rad
On Sat Mar 07 05:39:10 2009, radimre@freemail.hu wrote: Show quoted text
> Hi, >
> > CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-
> 0.009a.tar.gz > > I upgraded to 0.009a (and also JE to 0.31) but the latest test about > the parent variable is still failing: > # perl teszt5.pl > ReferenceError: The variable parent has not been declared at line 1.
Oops! I forgot to add it to the list of methods that JS can access. I have just uploaded a new version: CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-0.009c.tar.gz Show quoted text
> > Also, the Dreamweavers image swapper function is still failing: > # perl teszt3.pl > ReferenceError: The variable Image has not been declared at > http://localhost/je/ > teszt2.htm, line 9. > > Are you going to add support for Image objects to JE?
I plan to. Right now I’m not sure where it fits in with the rest of the modules, but for now you can use this workaround (untested): $mech->use_plugin('JavaScript' => init => sub { shift->eval(' function Image() { var i = document.createElement("img"); 0 in arguments && i.setAttribute("width",arguments[0]) 1 in arguments && i.setAttribute("height",arguments[1]) return i } '); } );
Subject: Re: [rt.cpan.org #43582] several mechanize + javascript bugs
Date: Fri, 27 Mar 2009 13:02:03 +0100
To: Father Chrysostomos via RT <bug-WWW-Mechanize-Plugin-JavaScript [...] rt.cpan.org>
From: Imre Rad <radimre [...] freemail.hu>
hello, thanks for the fixes. Here is a next issue of 0.009c: Error: Undefined subroutine &WWW::Mechanize::Plugin::JavaScript::JE::_unescape called at C:/Perl/site/lib/JE/Object/Function.pm line 433. it is caused by the default google analytics code: <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> I also got some other undefined subroutine issues, but i didnt trace the source js: Error: Undefined subroutine &WWW::Mechanize::Plugin::JavaScript::JE::_encodeURIComponent called at C:/Perl/site/lib/JE/Object/Function.pm line 433. best regards imre rad Show quoted text
Show quoted text
> On Sat Mar 07 05:39:10 2009, radimre@freemail.hu wrote:
>> Hi, >>
>> > CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-
>> 0.009a.tar.gz >> >> I upgraded to 0.009a (and also JE to 0.31) but the latest test about >> the parent variable is still failing: >> # perl teszt5.pl >> ReferenceError: The variable parent has not been declared at line 1.
Show quoted text
> Oops! I forgot to add it to the list of methods that JS can access. I have just uploaded a new > version:
Show quoted text
> CPAN/authors/id/S/SP/SPROUT/WWW-Mechanize-Plugin-JavaScript-0.009c.tar.gz
Show quoted text
>> >> Also, the Dreamweavers image swapper function is still failing: >> # perl teszt3.pl >> ReferenceError: The variable Image has not been declared at >> http://localhost/je/ >> teszt2.htm, line 9. >> >> Are you going to add support for Image objects to JE?
Show quoted text
> I plan to. Right now I’m not sure where it fits in with the rest of the modules, but for now > you can use this workaround (untested):
$mech->>use_plugin('JavaScript' => init => sub { shift->eval(' Show quoted text
> function Image() { > var i = document.createElement("img"); > 0 in arguments && i.setAttribute("width",arguments[0]) > 1 in arguments && i.setAttribute("height",arguments[1]) > return i > } > '); } );
Show quoted text
> _____________ NOD32 3925 (20090311) Inform?ci? _____________
Show quoted text
> Az ?zenetet a NOD32 antivirus system megvizsg?lta. > http://www.nod32.hu
On Fri Mar 27 08:02:40 2009, radimre@freemail.hu wrote: Show quoted text
> hello, thanks for the fixes. Here is a next issue of 0.009c: > > Error: Undefined subroutine > &WWW::Mechanize::Plugin::JavaScript::JE::_unescape called at > C:/Perl/site/lib/JE/Object/Function.pm line 433. >
Oh no, this is coming back to haunt me! I ran into this problem before, but I couldn’t reproduce it the last time I tried. It has to do with what package ‘require’ causes the code to run in. Different perl version behave differently. I’ll try to fix this soon....
On Sat Mar 28 14:22:41 2009, SPROUT wrote: Show quoted text
> On Fri Mar 27 08:02:40 2009, radimre@freemail.hu wrote:
> > hello, thanks for the fixes. Here is a next issue of 0.009c: > > > > Error: Undefined subroutine > > &WWW::Mechanize::Plugin::JavaScript::JE::_unescape called at > > C:/Perl/site/lib/JE/Object/Function.pm line 433. > >
> > Oh no, this is coming back to haunt me! I ran into this problem > before, but I couldn’t reproduce > it the last time I tried. It has to do with what package ‘require’ > causes the code to run in. > Different perl version behave differently. > > I’ll try to fix this soon....
CPAN/authors/id/S/SP/SPROUT/JE-0.032.tar.gz Actually, this was a problem with subclassing (and overriding the ‘prop’ method, which the JE backend does) and JE’s autoload feature, which tries to run code in the ‘calling package’. The calling package turned out to be the subclass, rather than the code calling the method to begin with. What I’ve done to the escape functions is a workaround. I’ve also put a caveat in JE::Object’s docs. I don’t have a clean solution for it right now.
Subject: Re: [rt.cpan.org #43582] several mechanize + javascript bugs
Date: Sat, 4 Apr 2009 14:40:08 +0200
To: Father Chrysostomos via RT <bug-WWW-Mechanize-Plugin-JavaScript [...] rt.cpan.org>
From: Imre Rad <radimre [...] freemail.hu>
Show quoted text
> Actually, this was a problem with subclassing (and overriding the ‘prop’ method, which the JE > backend does) and JE’s autoload feature, which tries to run code in the ‘calling package’. The > calling package turned out to be the subclass, rather than the code calling the method to > begin with.
Show quoted text
> What I’ve done to the escape functions is a workaround. I’ve also put a caveat in JE::Object’s > docs. I don’t have a clean solution for it right now.
Nice work! I got an easy-to-fix bug this time, it seems you are trying to parse LWP's error message if fetching the remote javascript code fails: couldn't get script http://www.google-analytics.com/ga.js: 500 Can't connect to www.google-analytics.com:80 (connect: timeout) at C:/Perl/site/lib/WWW/Mechanize /Plugin/DOM.pm line 140 SyntaxError: Expected semicolon, '}' or end of line but found ' Can't con' at ht tp://www.google-analytics.com/ga.js, line 1. I also got another real life shit, about comments again. The following javascript snipplet with incorrect syntax is parsed by browsers, but not by JE: <script type="text/javascript"> var win2; <!-- function other() { window.alert("foobar"); } //--> </script> The error is: SyntaxError: Expected statement or function declaration but found '<!-- func' at http://localhost/je/teszt9.html, line 3. It would be nice of you if your parser was a bit less strict in this case, for eg you could turn this error into a warning and continue parsing to make it more robust. This time I also have a feature request: it would be nice if one could filter the external (not-embedded) javascripts to load, for example with a callback. So I could blacklist google analytics, which drops some random errors, like TypeError: undefined has no properties, not even one named length at http://www.google-analytics.com/ga.js, line 5. Argument "undefined" isn't numeric in delete at C:/Perl/site/lib/WWW/Mechanize/Plugin/DOM/Window.pm line 106. and some other times it works like a charm. Filtering out unneccessary scripts would also make script faster. WWW::Mechanize has a follow_link method. According to its docs: "Returns the result of the GET method (an HTTP::Response object) if a link was found. If the page has no links, or the specified link couldn't be found, returns undef." If I call follow_link on a href="javascript:..." link, it will return undef, thou it executed the script correctly. Do you think you could return a simple true value? Since most ppl use this method for checking the existency of the link and clicking it in the same time anyway... like this: if(!$mech->follow_link(url_regex => qw/javascript:write.../)) { print STDERR "Link not found!\n"; exit; } The site i am trying to write the Mech script for, is using the stuff called "WICK: Web Input Completion Kit". It has the following javascript lines at the end: for (x=0;x<collection.length;x++) { collection[x] = collection[x].replace(/\,/gi,''); }// I have no idea what this collection is, but my firefox doesnt report an error in its console unlike JE: ReferenceError: The variable collection has not been declared at http://localhost/je/wick.js, line 483. Thanks for all your help. best regards imre
On Sat Apr 04 08:40:32 2009, radimre@freemail.hu wrote: Show quoted text
> > Actually, this was a problem with subclassing (and overriding the
> ‘prop’ method, which the JE
> > backend does) and JE’s autoload feature, which tries to run code in
> the ‘calling package’. The
> > calling package turned out to be the subclass, rather than the code
> calling the method to
> > begin with.
>
> > What I’ve done to the escape functions is a workaround. I’ve also
> put a caveat in JE::Object’s
> > docs. I don’t have a clean solution for it right now.
> > > Nice work! I got an easy-to-fix bug this time, it seems you are trying > to parse LWP's error message if fetching the remote javascript code > fails: > > couldn't get script http://www.google-analytics.com/ga.js: 500 Can't > connect to > www.google-analytics.com:80 (connect: timeout) at > C:/Perl/site/lib/WWW/Mechanize > /Plugin/DOM.pm line 140
That’s a Windows file path. I didn’t know JE worked on Windows. Show quoted text
> SyntaxError: Expected semicolon, '}' or end of line but found ' Can't > con' at ht > tp://www.google-analytics.com/ga.js, line 1.
What do real browsers do in such cases? Show quoted text
> I also got another real life shit, about comments again. The following > javascript snipplet with incorrect syntax is parsed by browsers, but > not by JE: > <script type="text/javascript"> > var win2; > <!-- > function other() > { > window.alert("foobar"); > } > //--> > </script> > > The error is: > SyntaxError: Expected statement or function declaration but found > '<!-- > func' at http://localhost/je/teszt9.html, line 3. > > It would be nice of you if your parser was a bit less strict in this > case, for eg you could turn this error into a warning and continue > parsing to make it more robust.
This is going to be a tough one, but I’m willing to tackle it. It is difficult because ‘x <!-- y’ is perfectly valid, as is ‘x --> y’, so it’s hard to filter them out. Unfortunately, ECMA provides no guidelines in this regard. Show quoted text
> > > This time I also have a feature request: > it would be nice if one could filter the external (not-embedded) > javascripts to load, for example with a callback. So I could blacklist > google analytics, which drops some random errors, like > > TypeError: undefined has no properties, not even one named length at > http://www.google-analytics.com/ga.js, line 5. > Argument "undefined" isn't numeric in delete at > C:/Perl/site/lib/WWW/Mechanize/Plugin/DOM/Window.pm line 106. > > and some other times it works like a charm. Filtering out unneccessary > scripts would also make script faster.
I think you can already use LWP handlers (untested): $mech->set_my_handler(     request_send => sub { return new HTTP::Response 200 ) },     m_host => 'www.google-analytics.com' ); Show quoted text
> WWW::Mechanize has a follow_link method. According to its docs: > "Returns the result of the GET method (an HTTP::Response object) if a > link was found. If the page has no links, or the specified link > couldn't be found, returns undef." > > If I call follow_link on a href="javascript:..." link, it will return > undef, thou it executed the script correctly. Do you think you could > return a simple true value? Since most ppl use this method for > checking the existency of the link and clicking it in the same time > anyway... like this: > if(!$mech->follow_link(url_regex => qw/javascript:write.../)) { > print STDERR "Link not found!\n"; > exit; > }
I really would like to deprecate this experimental Mechanize and its plugins. Right now I’m working on getting WWW::Scripter (the replacement) to pass the tests. I hope to have it done in a few days. I’ll make WWW::Scripter work like that. Show quoted text
> The site i am trying to write the Mech script for, is using the stuff > called "WICK: Web Input Completion Kit". It has the following > javascript lines at the end: > > for (x=0;x<collection.length;x++) { > collection[x] = collection[x].replace(/\,/gi,''); > }// > > I have no idea what this collection is, but my firefox doesnt report > an error in its console unlike JE: > ReferenceError: The variable collection has not been declared at > http://localhost/je/wick.js, line 483.
I’m afraid I’m clueless about this one.
On Sat Apr 04 13:01:07 2009, SPROUT wrote: Show quoted text
> On Sat Apr 04 08:40:32 2009, radimre@freemail.hu wrote:
> > I also got another real life [...], about comments again. The
> following
> > javascript snipplet with incorrect syntax is parsed by browsers, but > > not by JE: > > <script type="text/javascript"> > > var win2; > > <!-- > > function other() > > { > > window.alert("foobar"); > > } > > //--> > > </script> > > > > The error is: > > SyntaxError: Expected statement or function declaration but found > > '<!-- > > func' at http://localhost/je/teszt9.html, line 3. > > > > It would be nice of you if your parser was a bit less strict in this > > case, for eg you could turn this error into a warning and continue > > parsing to make it more robust.
> > This is going to be a tough one, but I’m willing to tackle it. It is > difficult because ‘x <!-- y’ is > perfectly valid, as is ‘x --> y’, so it’s hard to filter them out. > Unfortunately, ECMA provides > no guidelines in this regard.
OK, I tried the following snippet in a couple of browsers and they both show 0 when they should be showing ‘true’: <script> a = 0; b = 1 c = a<!--b document.write(c) </script> That means they are not standards-compliant. In your case, the script should definitely be fixed. If that is not possible, I suggest a workaround: $mech->set_my_handler( response_done => sub { my $response = shift; ${ $response->content_ref } =~ s/<!--|-->//g; }, m_path_match => qr/\.js\z/ );
Subject: Re: [rt.cpan.org #43582] several mechanize + javascript bugs
Date: Sun, 5 Apr 2009 12:03:08 +0200
To: Father Chrysostomos via RT <bug-WWW-Mechanize-Plugin-JavaScript [...] rt.cpan.org>
From: Imre Rad <radimre [...] freemail.hu>
Show quoted text
>> couldn't get script http://www.google-analytics.com/ga.js: 500 Can't >> connect to >> www.google-analytics.com:80 (connect: timeout) at >> C:/Perl/site/lib/WWW/Mechanize >> /Plugin/DOM.pm line 140
Show quoted text
> That’s a Windows file path. I didn’t know JE worked on Windows.
Yes it is, and yes it does. Installing it is not even painful. Show quoted text
>> SyntaxError: Expected semicolon, '}' or end of line but found ' Can't >> con' at http://www.google-analytics.com/ga.js, line 1.
> What do real browsers do in such cases?
I think its nothing what they can do, shit happens, some files can not be loaded some times. I recommend checking the response code, if its not 200, then fire a Warning message and skip parsing the error text :) Show quoted text
>> for (x=0;x<collection.length;x++) { >> collection[x] = collection[x].replace(/\,/gi,''); >> }//
> I’m afraid I’m clueless about this one.
You are right, I hurried off reporting this one, it seems situation is more complex, I need some further investigation. The site I am trying to script has one of its javascript codes obfuscated, I guess by a commercial one. It would be nice if you could take a look at the code, since your JE engine interprets its completly wrong. I changed the eval($) at the end to window.alert, so I can see a correct javascript code in the browser, while JE output is something really weird: justifycenter border="0" cellpadding="0" cellspac src="images/content/corner_lt=""></td>';strx+='<isRichText= border="0" cellpadding="0" cellspac src="images/content/corner_lt=""></td>';strx+='<f border="0" cellpadding="0" cellspac src="images/content/corner_lt=""></td>';strx+='<a border="0" cellpadding="0" cellspac src="images/content/corner_lt=""></td>';strx+='<l border="0" cellpadding="0" cellspac src="images/content/corner_lt=""></ I think traceing this down might make JE more robust. If you want to deal with this, I send you the script. best regards, Imre
I’ve gone and released WWW::Scripter very hastily: CPAN/authors/id/S/SP/SPROUT/WWW-Scripter-0.001.tar.gz CPAN/authors/id/S/SP/SPROUT/WWW-Scripter-Plugin-JavaScript-0.001.tar.gz CPAN/authors/id/S/SP/SPROUT/WWW-Scripter-Plugin-Ajax-0.01.tar.gz Concerning the Image object, your request regarding follow_link, and the parsing of error responses as scripts, these are supposedly fixed in WWW::Scripter. I’ve not tested them yet. Concerning triggering mouse events upon click(), there are too many variations as to how it could be done. I’ll leave it to the user to call trigger_event himself in whichever order he prefers. (Also, I use mouse events to evade robots.) I am going to mark this ticket as resolved. I also plan to release a new WMPJS with deprecation notices all over the place. (WWW::Scripter is *so* much cleaner than the tangled mess I’ve created with Mechanize!) Please switch over to Scripter and see whether it works for you. On Sun Apr 05 06:03:41 2009, radimre@freemail.hu wrote: Show quoted text
> >> couldn't get script http://www.google-analytics.com/ga.js: 500 Can't > >> connect to > >> www.google-analytics.com:80 (connect: timeout) at > >> C:/Perl/site/lib/WWW/Mechanize > >> /Plugin/DOM.pm line 140
>
> > That’s a Windows file path. I didn’t know JE worked on Windows.
> > Yes it is, and yes it does. Installing it is not even painful.
I’m not so sure that’s a good thing to here. I’m always looking for every excuse to get people to stop using Windoze. :-) Show quoted text
> >> SyntaxError: Expected semicolon, '}' or end of line but found ' Can't > >> con' at http://www.google-analytics.com/ga.js, line 1.
> > What do real browsers do in such cases?
> > I think its nothing what they can do, shit happens, some files can not > be loaded some times. > I recommend checking the response code, if its not 200, then fire a > Warning message and skip parsing the error text :)
I checked Safari. It adds a warning to the console and then pretends the file is empty, which is more or less equivalent to what you suggest. Show quoted text
> >
> >> for (x=0;x<collection.length;x++) { > >> collection[x] = collection[x].replace(/\,/gi,''); > >> }//
> > I’m afraid I’m clueless about this one.
> > You are right, I hurried off reporting this one, it seems situation is > more complex, I need some further investigation. > > > The site I am trying to script has one of its javascript codes > obfuscated, I guess by a commercial one. It would be nice if you could > take a look at the code, since your JE engine interprets its completly > wrong. I changed the eval($) at the end to window.alert, so I can see > a correct javascript code in the browser, while JE output is something > really weird: > justifycenter border="0" cellpadding="0" cellspac > src="images/content/corner_lt=""></td>';strx+='<isRichText= border="0" > cellpadding="0" cellspac > src="images/content/corner_lt=""></td>';strx+='<f border="0" > cellpadding="0" cellspac > src="images/content/corner_lt=""></td>';strx+='<a border="0" > cellpadding="0" cellspac > src="images/content/corner_lt=""></td>';strx+='<l border="0" > cellpadding="0" cellspac src="images/content/corner_lt=""></ > > I think traceing this down might make JE more robust. If you want to > deal with this, I send you the script.
I really don’t think I’ll have time to sift through obfuscated code. I was only able to release WWW::Scripter because I need it for a project I’m working on (part of real life).