Skip Menu |

This queue is for tickets about the WWW-Scripter CPAN distribution.

Report information
The Basics
Id: 84472
Status: resolved
Priority: 0/
Queue: WWW-Scripter

People
Owner: Nobody in particular
Requestors: lxp [...] cpan.org
Cc:
AdminCc: sprout [...] cpan.org

Bug Information
Severity: (no value)
Broken in: 0.029
Fixed in: 0.030



Subject: content() not returning content from most recent response
I'm running the following code to attempt to log into a banking website. (The website is excessively complex; almost every request results in an HTTP redirect or some JavaScript to advance to another resource.) use WWW::Scripter (); my $ua = WWW::Scripter->new; $ua->use_plugin('JavaScript'); $ua->get('https://www.ubank.com.au/NAGAuthn/ubank.secgate.action'); $ua->submit_form( with_fields => { f_username => 'user@example.com', password => cipher('LetMeIn123'), }, ); print substr($ua->content, 0, 70), "...\n"; Running the above code with the following options yields a log of every request and response (including response content), attached as http-log.txt. $ LOG_SHOW_CATEGORY=1 TRACE=1 perl -MLog::Any::App -MLog::Any::For::LWP=-log_response_body,1 $ua->content returns content for the response shown at line 2394--even though WWW::Scripter correctly parses the JavaScript in that response and GETs the next resource, shown at line 2606. I expect the latter response to be made available via $ua->content rather than the former. Peculiarly, the response content at line 2606 seems incomplete. I don't know whether this is due to the server or the logging. It may explain why $ua->content doesn't hold the expected content though. Any assistance in resolving this issue will be very gratefully received.
Subject: http-log.txt

Message body is not shown because it is too large.

On Sun Apr 07 01:49:50 2013, LXP wrote: Show quoted text
> Peculiarly, the response content at line 2606 seems incomplete. I > don't know whether this is due to the server or the logging. It > may explain why $ua->content doesn't hold the expected content > though.
It seems that the response referenced above is in fact incomplete, but seemingly not because of the server or the logging. I've just noticed that the response, requested via HTTPS, uses "chunked" content transfer encoding. It looks like only the first chunk might be getting returned in the response object, but I'll need to work out how to debug that in greater detail. I recognise that this alone is not a WWW::Scripter issue. But could this explain why the response is not getting returned by the "content" method of WWW::Scripter?
On Sun Apr 14 16:05:02 2013, LXP wrote: Show quoted text
> I've just noticed that the response, requested via HTTPS, uses "chunked" > content transfer encoding. It looks like only the first chunk might be > getting returned in the response object, but I'll need to work out how > to debug that in greater detail. I recognise that this alone is not a > WWW::Scripter issue.
The incomplete response was a red herring caused by an "unexpected" User-Agent string being sent in the request. Pretending to be a certain mainstream Windows browser solves that problem. In any event, it still seems that the malformed response is what should have been getting returned by content(), not the one before it.
On Mon Apr 15 09:59:20 2013, LXP wrote: Show quoted text
> In any event, it still seems that the malformed response is what should > have been getting returned by content(), not the one before it.
I've determined the cause of the problem: At the time that the desired request is being handled by the "_update_page" method, other calls to _update_page lower in the stack are waiting to finish. Attached is the stack trace in my case when the content that I want is being handled; as there are two more _update_page calls waiting to finish, my desired content is overwritten twice as the stack unwinds. Not evident in the stack trace, but possibly related, is the fact that subsequent GETs occur by WWW::Scripter in my case after it stops processing the desired page--presumably because content from prior pages is still waiting to be fetched lower in the stack.
Subject: www-scripter-stack-trace.txt
WWW::Scripter::_update_page('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x61d6ef8)', 'HTTP::Response=HASH(0x62cd2f0)') called at .../WWW/Scripter.pm line 266 WWW::Scripter::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x61d6ef8)') called at .../LWP/UserAgent.pm line 418 LWP::UserAgent::get('WWW::Scripter=HASH(0x3487c90)', 'https://www.ubank.com.au/ubank/web/my-money/overview;PORTSESS...') called at .../WWW/Mechanize.pm line 407 WWW::Mechanize::get('WWW::Scripter=HASH(0x3487c90)', 'JE::String=ARRAY(0x61d4ff0)') called at .../WWW/Scripter.pm line 271 WWW::Scripter::get('WWW::Scripter=HASH(0x3487c90)', 'JE::String=ARRAY(0x61d4ff0)') called at .../WWW/Scripter.pm line 1601 WWW::Scripter::Location::replace('WWW::Scripter::Location=REF(0x2faede0)', 'JE::String=ARRAY(0x61d4ff0)') called at .../JE.pm line 1537 JE::__ANON__('JE::Object::Proxy=REF(0x61d4af8)', 'JE::String=ARRAY(0x61d4ff0)') called at .../JE/Object/Function.pm line 466 JE::Object::Function::apply('JE::Object::Function=REF(0x618d3a0)', 'JE::Object::Proxy=REF(0x61d4af8)', 'JE::String=ARRAY(0x61d4ff0)') called at .../JE/LValue.pm line 114 JE::LValue::call('JE::LValue=ARRAY(0x61d5668)', 'JE::String=ARRAY(0x61d4ff0)') called at .../JE/Code.pm line 1287 JE::Code::Expression::eval('JE::Code::Expression=ARRAY(0x61cce48)') called at .../JE/Code.pm line 1377 JE::Code::Expression::_eval_term('JE::Code::Expression=ARRAY(0x61cce48)') called at .../JE/Code.pm line 1145 JE::Code::Expression::eval('JE::Code::Expression=ARRAY(0x61cc938)') called at .../JE/Code.pm line 368 JE::Code::Statement::eval('JE::Code::Statement=ARRAY(0x61cc248)') called at .../JE/Code.pm line 349 JE::Code::Statement::eval('JE::Code::Statement=ARRAY(0x618ad98)') called at .../JE/Code.pm line 186 eval {...} called at .../JE/Code.pm line 157 JE::Code::execute('JE::Code=HASH(0x6195bd0)') called at .../JE.pm line 815 JE::eval('WWW::Scripter::Plugin::JavaScript::JE=REF(0x5f5b2b8)', '\x{a}/*\x{a}** Copyright (c) 2008, Oracle and/or its affiliates. All ...', 'URI::https=SCALAR(0x5f3f2e8)', 1) called at .../WWW/Scripter/Plugin/JavaScript.pm line 104 WWW::Scripter::Plugin::JavaScript::eval('WWW::Scripter::Plugin::JavaScript=ARRAY(0x3484d10)', 'WWW::Scripter=HASH(0x3487c90)', '\x{a}/*\x{a}** Copyright (c) 2008, Oracle and/or its affiliates. All ...', 'URI::https=SCALAR(0x5f3f2e8)', 1, 1) called at .../WWW/Scripter.pm line 448 WWW::Scripter::__ANON__('HTML::DOM=HASH(0x5f3e760)', 'HTML::DOM::Element::Script=HASH(0x5f0c0f8)') called at .../HTML/DOM.pm line 496 HTML::DOM::__ANON__('HTML::DOM::Element::Script=HASH(0x5f0c0f8)', 'script', 'HTML::DOM::Element::HTML=HASH(0x5f0b858)') called at .../HTML/DOM/_TreeBuilder.pm line 1118 HTML::DOM::_TreeBuilder::end('HTML::DOM::Element::HTML=HASH(0x5f0b858)', 'script', '</script>') called at .../HTML/DOM.pm line 383 HTML::DOM::Element::HTML::end('HTML::DOM::Element::HTML=HASH(0x5f0b858)', 'script', '</script>') called at .../HTML/DOM.pm line 726 eval {...} called at .../HTML/DOM.pm line 726 HTML::DOM::write('HTML::DOM=HASH(0x5f3e760)', '<html lang="en-AU"><head><script>\x{a}/*\x{a}** Copyright (c) 2008, O...') called at .../WWW/Scripter.pm line 543 WWW::Scripter::update_html('WWW::Scripter=HASH(0x3487c90)', '<html lang="en-AU"><head><script>\x{a}/*\x{a}** Copyright (c) 2008, O...') called at .../WWW/Scripter.pm line 311 WWW::Scripter::_update_page('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5f3e8c8)', 'HTTP::Response=HASH(0x5e4c090)') called at .../WWW/Scripter.pm line 266 WWW::Scripter::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5f3e8c8)', undef, undef, 'HTTP::Response=HASH(0x5e4c150)') called at .../LWP/UserAgent.pm line 349 LWP::UserAgent::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5e421a8)', undef, undef, 'HTTP::Response=HASH(0x5e4b970)') called at .../WWW/Mechanize.pm line 2503 WWW::Mechanize::_make_request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5e421a8)', undef, undef, 'HTTP::Response=HASH(0x5e4b970)') called at .../WWW/Scripter.pm line 247 WWW::Scripter::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5e421a8)', undef, undef, 'HTTP::Response=HASH(0x5e4b970)') called at .../LWP/UserAgent.pm line 349 LWP::UserAgent::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5e41fb0)') called at .../WWW/Mechanize.pm line 2503 WWW::Mechanize::_make_request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5e41fb0)') called at .../WWW/Scripter.pm line 247 WWW::Scripter::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5e41fb0)') called at .../LWP/UserAgent.pm line 418 LWP::UserAgent::get('WWW::Scripter=HASH(0x3487c90)', 'https://www.ubank.com.au/obrar.cgi?encreply=wjmY5mbQrN11JvXp8...') called at .../WWW/Mechanize.pm line 407 WWW::Mechanize::get('WWW::Scripter=HASH(0x3487c90)', 'JE::String=ARRAY(0x5e30628)') called at .../WWW/Scripter.pm line 271 WWW::Scripter::get('WWW::Scripter=HASH(0x3487c90)', 'JE::String=ARRAY(0x5e30628)') called at .../WWW/Scripter.pm line 1538 WWW::Scripter::Location::href('WWW::Scripter::Location=REF(0x2faede0)', 'JE::String=ARRAY(0x5e30628)') called at .../JE.pm line 1727 JE::__ANON__('JE::Object::Proxy=REF(0x5e41b30)', 'JE::String=ARRAY(0x5e30628)', undef) called at .../JE/Object.pm line 374 JE::Object::prop('JE::Object::Proxy=REF(0x5e41b30)', 'href', 'JE::String=ARRAY(0x5e30628)') called at .../JE/LValue.pm line 100 JE::LValue::set('JE::LValue=ARRAY(0x5e41770)', 'JE::String=ARRAY(0x5e30628)') called at .../JE/Code.pm line 1200 eval {...} called at .../JE/Code.pm line 1200 JE::Code::Expression::eval('JE::Code::Expression=ARRAY(0x5dafaa0)') called at .../JE/Code.pm line 1377 JE::Code::Expression::_eval_term('JE::Code::Expression=ARRAY(0x5dafaa0)') called at .../JE/Code.pm line 1145 JE::Code::Expression::eval('JE::Code::Expression=ARRAY(0x5daf9e0)') called at .../JE/Code.pm line 349 JE::Code::Statement::eval('JE::Code::Statement=ARRAY(0x5daf8c0)') called at .../JE/Code.pm line 186 eval {...} called at .../JE/Code.pm line 157 JE::Code::execute('JE::Code=HASH(0x5e1fce0)', 'WWW::Scripter::Plugin::JavaScript::JE=REF(0x5aeed30)', 'JE::Scope=ARRAY(0x5e41a58)', 2) called at .../JE/Object/Function.pm line 486 JE::Object::Function::apply('JE::Object::Function=REF(0x5e1ffb0)', 'WWW::Scripter::Plugin::JavaScript::JE=REF(0x5aeed30)', 'JE::String=ARRAY(0x5e30628)') called at .../JE/LValue.pm line 114 JE::LValue::call('JE::LValue=ARRAY(0x5e2dd30)', 'JE::String=ARRAY(0x5e30628)') called at .../JE/Code.pm line 1287 JE::Code::Expression::eval('JE::Code::Expression=ARRAY(0x5dcb9a0)') called at .../JE/Code.pm line 1377 JE::Code::Expression::_eval_term('JE::Code::Expression=ARRAY(0x5dcb9a0)') called at .../JE/Code.pm line 1145 JE::Code::Expression::eval('JE::Code::Expression=ARRAY(0x5dcb640)') called at .../JE/Code.pm line 349 JE::Code::Statement::eval('JE::Code::Statement=ARRAY(0x5dcb3a0)') called at .../JE/Code.pm line 186 eval {...} called at .../JE/Code.pm line 157 JE::Code::execute('JE::Code=HASH(0x5e3ee40)', 'WWW::Scripter::Plugin::JavaScript::JE=REF(0x5aeed30)', 'JE::Scope=ARRAY(0x5e3a9a8)', 2) called at .../JE/Object/Function.pm line 486 JE::Object::Function::apply('JE::Object::Function=REF(0x5e3f068)', 'WWW::Scripter::Plugin::JavaScript::JE=REF(0x5aeed30)', 'JE::Object::Proxy=REF(0x5a48a60)') called at .../JE/Object/Function.pm line 334 JE::Object::Function::call_with('JE::Object::Function=REF(0x5e3f068)', 'WWW::Scripter=HASH(0x3487c90)', 'HTML::DOM::Event=HASH(0x5a4f338)') called at .../HTML/DOM/EventTarget.pm line 217 HTML::DOM::EventTarget::__ANON__('HTML::DOM::Event=HASH(0x5a4f338)') called at .../HTML/DOM/EventTarget.pm line 356 eval {...} called at .../HTML/DOM/EventTarget.pm line 359 HTML::DOM::EventTarget::_dispatch_event('WWW::Scripter=HASH(0x3487c90)', 1, 'HTML::DOM::Event=HASH(0x5a4f338)') called at .../HTML/DOM/EventTarget.pm line 250 HTML::DOM::EventTarget::dispatchEvent('WWW::Scripter=HASH(0x3487c90)', 'HTML::DOM::Event=HASH(0x5a4f338)') called at .../HTML/DOM/EventTarget.pm line 492 HTML::DOM::EventTarget::trigger_event('WWW::Scripter=HASH(0x3487c90)', 'load', 'target', 'HTML::DOM=HASH(0x5a4b968)') called at .../WWW/Scripter.pm line 552 WWW::Scripter::update_html('WWW::Scripter=HASH(0x3487c90)', '\x{d}\x{a}\x{d}\x{a}\x{d}\x{a}\x{d}\x{a}\x{d}\x{a}\x{d}\x{a}\x{d}\x{a}\x{d}\x{a} <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML ...') called at .../WWW/Scripter.pm line 311 WWW::Scripter::_update_page('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5a3f868)', 'HTTP::Response=HASH(0x5a4b3b0)') called at .../WWW/Scripter.pm line 266 WWW::Scripter::request('WWW::Scripter=HASH(0x3487c90)', 'HTTP::Request=HASH(0x5a3f868)') called at .../WWW/Scripter.pm line 650 WWW::Scripter::submit('WWW::Scripter=HASH(0x3487c90)') called at .../WWW/Mechanize.pm line 1920 WWW::Mechanize::submit_form('WWW::Scripter=HASH(0x3487c90)', 'with_fields', 'HASH(0x35df8a0)') called at test.pl line 97
On Mon Apr 15 12:09:58 2013, LXP wrote: Show quoted text
> I've determined the cause of the problem: > > At the time that the desired request is being handled by the > "_update_page" method, other calls to _update_page lower in the stack > are waiting to finish. Attached is the stack trace in my case when the > content that I want is being handled; as there are two more _update_page > calls waiting to finish, my desired content is overwritten twice as the > stack unwinds.
After further investigation I can explain the problem more "simply" (or at least as simply as the website I'm dealing with allows): 1. Request resource A via POST. 2. Server responds with HTML containing JavaScript: window.top.location.href = B; 3. _update_page is started with request A and response A, and does not complete yet. 4. Request resource B via GET. 5. Server responds with 302 redirect to resource C. 5. Request resource C via GET. 6. Server responds with 302 redirect to resource D. 7. Request resource D via GET. 8. Server responds with HTML containing JavaScript: document.location.replace(E); 9. Another _update_page call is added to the stack with request D and response D, and does not complete yet. There is one other pending _update_page run on the stack. 10. Request resource E via GET. 11. Server responds with desired HTML. 12. Another _update_page call is added to the stack with request E and response E, and does not complete yet. There are two other pending _update_page runs on the stack. 13. _update_page is called for each of response E's many useless JavaScript dependencies. 14. The _update_page call in step 12 completes with the WWW::Scripter instance's "content" attribute holding the content of response E. There are still two other pending _update_page runs on the stack. 15. The _update_page call in step 9 completes with the WWW::Scripter instance's "content" attribute still holding the content of response E. There is still one other pending _update_page run on the stack. 16. A new _update_page call is made with request C and response D, and completes with the WWW::Scripter instance's "content" attribute now holding the content of response D. There is still one other pending _update_page run on the stack. 17. Another new _update_page call is made with request B and response D, and completes with the WWW::Scripter instance's "content" attribute still holding the content of response D. There is still one other pending _update_page run on the stack. 18. The _update_page call in step 3 completes with the WWW::Scripter instance's "content" attribute still holding the content of response D. 19. $w->content erroneously returns response D's content instead of response E's content. Looking at the above sequence of events, the following things stand out for me: * Document dependencies such as external .js files seem to cause recursive calls to _update_page, as do JavaScript redirects. * The 302 responses in steps 5 and 6 don't seem to trigger the above- mentioned recursion. * It seems a little strange that redirections would be treated differently based on whether they're initiated through server response headers or client scripts. * In steps 16 and 17, _update_page is called with mismatching request/ response pairs. Further, these calls involve requests that don't have corresponding "end point" responses. This is almost certainly a bug. I suppose that the next step for me in diagnosing this bug would be to look very carefully at how WWW::Scripter interacts with WWW::Mechanize (or LWP::UserAgent) when redirects are occurring.
On Wed May 01 23:23:53 2013, LXP wrote: Show quoted text
> * In steps 16 and 17, _update_page is called with mismatching request/ > response pairs. Further, these calls involve requests that don't > have corresponding "end point" responses. This is almost certainly > a bug.
WWW::Scripter's "request" method is responsible for triggering the _update_page calls mentioned above, and is probably the culprit.
On Wed May 01 23:23:53 2013, LXP wrote: Show quoted text
> 4. Request resource B via GET. > 5. Server responds with 302 redirect to resource C. > 5. Request resource C via GET. > 6. Server responds with 302 redirect to resource D.
A test can be written for something like this by writing some custom LWP::UserAgent handlers to emulate responses with 302 status codes for certain request types. It would be really good to distill all of this into a Perl test.
On Wed May 01 23:34:00 2013, LXP wrote: Show quoted text
> On Wed May 01 23:23:53 2013, LXP wrote:
> > 4. Request resource B via GET. > > 5. Server responds with 302 redirect to resource C. > > 5. Request resource C via GET. > > 6. Server responds with 302 redirect to resource D.
> > A test can be written for something like this by writing some custom > LWP::UserAgent handlers to emulate responses with 302 status codes for > certain request types. It would be really good to distill all of this > into a Perl test.
The attached Perl test conveys the problem more simply again: * Requesting Resource 1 causes a 302 redirect to Resource 2. * Resource 2 is HTML with a JavaScript redirect to Resource 3. * Resource 3 holds the desired content. WWW::Scripter's "content" method returns the desired content when Resources 3 and 2 are requested, but not when Resource 1 is requested. Sadly this test requires WWW::Scripter::Plugin::JavaScript, but after further delving it should be possible to produce another test to convey the problem without an external dependency.
Subject: rt-84472.t
#!/usr/bin/env perl use strict; use warnings; use lib 't'; use HTTP::Response (); use WWW::Scripter (); use WWW::Scripter::Plugin::JavaScript (); my $w = WWW::Scripter->new( autocheck => 1 ); $w->use_plugin('JavaScript'); for ( [ '/R1' => \'fake:///R2' ], [ '/R2' => '<script> document.location.replace("fake:///R3") </script>' ], [ '/R3' => 'DESIRED CONTENT' ], ) { my ($req_path, $res_content) = @$_; my $is_redir = ref $res_content; $w->set_my_handler( 'request_send', sub { my $req = shift; my $res = HTTP::Response->new( $is_redir ? 302 : 200 ); $res->request($req); if ($is_redir) { $res->header( 'Location' => $$res_content ); } else { $res->content($res_content); # Ensure <script>s get executed. $res->header( 'Content-Type' => 'text/html' ) if $res_content =~ /script/; } return $res; }, ( m_scheme => 'fake', m_path => $req_path ), ); } use tests 3; for (qw{ /R3 /R2 /R1 }) { $w->get("fake://$_"); is $w->content, 'DESIRED CONTENT', "'content' method must return correct response for request $_"; }
(I've explicitly added your unmunged email address to this ticket now, because looking at your default RT email address I suspect that RT has just been shooting all of the past correspondence into the aether.)
On Wed May 01 23:28:13 2013, LXP wrote: Show quoted text
> On Wed May 01 23:23:53 2013, LXP wrote:
> > * In steps 16 and 17, _update_page is called with mismatching request/ > > response pairs. Further, these calls involve requests that don't > > have corresponding "end point" responses. This is almost certainly > > a bug.
> > WWW::Scripter's "request" method is responsible for triggering the > _update_page calls mentioned above, and is probably the culprit.
The way "_make_request" is used in "request" is the problem. "_make_request" is provided by WWW::Mechanize and just proxies to "request" in LWP::UserAgent. When running the supplied test: * _make_request takes a request for Resource 3 (the desired content) and returns it. * _make_request takes a request for Resource 2 (the page with a JS redirect to Resource 3) and returns it. * _make_request takes a request for Resource 1 (the 302 redirect) but returns Resource 2. As far as LWP::UserAgent would be concerned, this is correct behaviour as it has no knowledge of JavaScript.
On Thu May 02 14:19:48 2013, LXP wrote: Show quoted text
> The way "_make_request" is used in "request" is the problem.
I retract that, and instead suggest that the problem is caused by the "_update_page" method assuming that the "update_html" method won't cause the response object to change. A naive fix would be to make the "update_html" method return the current response object (it currently returns nothing) and make "_update_page" return whatever response "update_html" returns instead of the response object passed into itself. This fix causes the test attached to this ticket to pass, but sadly it also causes test "credentials.t" to die. On Thu May 02 13:07:26 2013, LXP wrote: Show quoted text
> Sadly this test requires WWW::Scripter::Plugin::JavaScript, but after > further delving it should be possible to produce another test to > convey the problem without an external dependency.
It's possible to cause a client-side redirect in a browser using the "meta refresh" technique, which involves placing a particular "meta" tag in the document's "head." As JavaScript is not required for this type of redirect to occur, the JavaScript plugin would not be required for the test. Unfortunately, WWW::Scripter at this time does not appear to handle pages using the "meta refresh" technique.
Subject: patch: make content() return correct content when request chain contains both 302 and JavaScript redirects
On Sat May 04 16:20:38 2013, LXP wrote: Show quoted text
> A naive fix would be to make the "update_html" method return the > current response object (it currently returns nothing) and make > "_update_page" return whatever response "update_html" returns instead > of the response object passed into itself. > > This fix causes the test attached to this ticket to pass, but sadly it > also causes test "credentials.t" to die.
It caused errors in other tests because I forgot to adjust one of the "returns." The attached patches provide new tests and a solution for this problem. It's a bit of a shame that so many hours of effort on this problem resulted in a fix of only three lines, but such is life! The attached test still requires the JavaScript plugin to be installed; I couldn't see any obvious way to test the behaviour without it. It will probably be necessary to skip the test if it can't be rewritten, and if that plugin is not available on the end user's system.
Subject: 0001-failing-test-for-incorrect-content-after-302-JS.patch
From 802c57b46abfaae065f1063567b35dcb7f982d86 Mon Sep 17 00:00:00 2001 From: Alex Peters <lxp@cpan.org> Date: Thu, 2 May 2013 13:23:53 +1000 Subject: [PATCH 1/2] failing test for incorrect content after 302/JS The "content" method returns the wrong content when the request chain contains both 302 and JavaScript redirects. See [rt.cpan.org #84472]. --- t/rt-84472.t | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 t/rt-84472.t diff --git a/t/rt-84472.t b/t/rt-84472.t new file mode 100644 index 0000000..0280336 --- /dev/null +++ b/t/rt-84472.t @@ -0,0 +1,45 @@ +#!/usr/bin/env perl + +use strict; +use warnings; + +use lib 't'; +use HTTP::Response (); +use WWW::Scripter (); +use WWW::Scripter::Plugin::JavaScript (); + +my $w = WWW::Scripter->new( autocheck => 1 ); +$w->use_plugin('JavaScript'); + +for ( + [ '/R1' => \'fake:///R2' ], + [ '/R2' => '<script> document.location.replace("fake:///R3") </script>' ], + [ '/R3' => 'DESIRED CONTENT' ], +) { + my ($req_path, $res_content) = @$_; + my $is_redir = ref $res_content; + $w->set_my_handler( + 'request_send', + sub { + my $req = shift; + my $res = HTTP::Response->new( $is_redir ? 302 : 200 ); + $res->request($req); + if ($is_redir) { + $res->header( 'Location' => $$res_content ); + } + else { + $res->header( 'Content-Type' => 'text/html' ); + $res->content($res_content); + } + return $res; + }, + ( m_scheme => 'fake', m_path => $req_path ), + ); +} + +use tests 3; +for (qw{ /R3 /R2 /R1 }) { + $w->get("fake://$_"); + like $w->content, qr/DESIRED CONTENT/, + "'content' method must return correct response for request $_"; +} -- 1.7.10.4
Subject: 0002-make-update_html-return-the-current-response.patch
From 16c13cdd0e844057f2bd79ffe7e8077340219bc2 Mon Sep 17 00:00:00 2001 From: Alex Peters <lxp@cpan.org> Date: Tue, 14 May 2013 15:47:50 +1000 Subject: [PATCH 2/2] make update_html return the current response This is necessary because the current response might change as a result of client-side redirects. Fixes [rt.cpan.org #84472]. --- lib/WWW/Scripter.pm | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/WWW/Scripter.pm b/lib/WWW/Scripter.pm index 71f335f..6002ead 100644 --- a/lib/WWW/Scripter.pm +++ b/lib/WWW/Scripter.pm @@ -312,7 +312,7 @@ sub _update_page { !defined $$self{Scripter_dumb} || $$self{Scripter_dumb} and $self->is_html ) { - $self->update_html($content); + $res = $self->update_html($content); } else { $self->{content} = $content; @@ -349,7 +349,7 @@ sub update_html { if(my $doc = $document{$res}) { $self->document($doc); $self->{form} = ($self->{forms} = $doc->forms)->[0]; - return; + return $res; } my $life_raft = $self; @@ -554,7 +554,7 @@ sub update_html { # banana $self->{form} ||= $self->{forms}[0]; - return; + return $self->{res}; } # Not an override, but used by update_html -- 1.7.10.4
On Sat May 04 16:20:38 2013, LXP wrote: Show quoted text
> It's possible to cause a client-side redirect in a browser using the > "meta refresh" technique, which involves placing a particular "meta" > tag in the document's "head." As JavaScript is not required for this > type of redirect to occur, the JavaScript plugin would not be required > for the test. > > Unfortunately, WWW::Scripter at this time does not appear to handle > pages using the "meta refresh" technique.
This concern has been moved to [rt.cpan.org #85673]. The functionality problems described in this ticket have been addressed by the release of WWW-Scripter v0.030.