Skip Menu |

This queue is for tickets about the WWW-Mechanize-Shell CPAN distribution.

Report information
The Basics
Id: 2703
Status: resolved
Priority: 0/
Queue: WWW-Mechanize-Shell

People
Owner: corion [...] cpan.org
Requestors: mark [...] summersault.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.22
Fixed in: (no value)



Subject: patch for <base href> related bugs.
The soon-to-be attached patch fixes two "BASE HREF" related bugs in ::Shell. 22: 1. It appears that if any <BASE tag was present, adding a BASE HREF was skipped. This would do the wrong thing in the case where a <BASE tag was used for something else besides HREF. 2. The <BASE tag was added at end the of the <HEAD> area. This means that any CSS links declared in the HEAD would not be handled properly. A test case of this is my homepage: http://mark.stosberg.com/ To address both of these, I used HTML::TokeParser::Simple to manipulate the HTML, which should be a better job than simple regexp matching. It adds the BASE HREF to the very top of the HEAD area, and only skips adding it if a tag is found that is BASE with an HREF attribute. This module will need to be added to the PREREQ_PM list. The patch also fixes what I consider a documentation glitch. I was wondering my my .mechanizerc file wasn't being read. It was because the example script users "rcfile =>undef". To me, it seems a better default to have the .mechanizerc file be read. In testing, I noticed that if sync_browser dies, the shell dies. Maybe some additional "evals" are needed somewhere. I didn't try to fix that. :)
From: mark [...] summersault.com
patch attached.
--- Shell.pm.orig Fri May 30 17:18:48 2003 +++ Shell.pm Fri May 30 17:45:44 2003 @@ -31,7 +31,7 @@ use strict; use WWW::Mechanize::Shell; - my $shell = WWW::Mechanize::Shell->new("shell", rcfile => undef ); + my $shell = WWW::Mechanize::Shell->new("shell"); if (@ARGV) { $shell->source_file( @ARGV ); @@ -260,8 +260,23 @@ # Prepare the HTML for local display : my $html = $self->agent->res->content; my $location = $self->agent->{uri}; - $html =~ s!(</head>)!<base href="$location" />$1!i - unless ($html =~ /<BASE/i); + + require HTML::TokeParser::Simple; + my $p = HTML::TokeParser::Simple->new(\$html) || die 'could not create HTML::TokeParser::Simple object'; +my $new_html; +while (my $token = $p->get_token) { + if ( $token->is_start_tag('head') ) { + $new_html .= $token->as_is. qq!<base href="$location" />!; + } + # If they already have a <base href>, give up + elsif ( $token->is_start_tag('base') and $token->return_attr->{href} ) { + $new_html = $html; + last; + } + else { + $new_html .= $token->as_is; + } +} my $browser; $browser = $self->browser; @@ -269,12 +284,12 @@ # We can push the HTML into a IE browser window my $document = $browser->{Document}; $document->open("text/html","replace"); - $document->write($html); + $document->write($new_html); } else { # We need to use a temp file for communication require File::Temp; my($tempfh, $tempfile) = File::Temp::tempfile(undef, UNLINK => 1); - print $tempfh $html; + print $tempfh $new_html; my $cmdline = sprintf($self->option('browsercmd'), $tempfile); system( $cmdline ) == 0 or warn "Couldn't launch '$cmdline' : $?";
Subject: revised base href patch
From: mark [...] summersault.com
I noticed today my previous patched failed sometimes because if $location = 'http://domain.com/file.html'; the BASE HREF would become 'http://domain.com/file.html' ...I'm not sure why this wouldn't have been a problem before. This modified patch trims the URL back to the last directory, since BASE HREF expects a directory name and not a file name;
--- Shell.pm.orig Fri May 30 17:18:48 2003 +++ Shell.pm Mon Jun 2 10:17:51 2003 @@ -31,7 +31,7 @@ use strict; use WWW::Mechanize::Shell; - my $shell = WWW::Mechanize::Shell->new("shell", rcfile => undef ); + my $shell = WWW::Mechanize::Shell->new("shell"); if (@ARGV) { $shell->source_file( @ARGV ); @@ -260,8 +260,27 @@ # Prepare the HTML for local display : my $html = $self->agent->res->content; my $location = $self->agent->{uri}; - $html =~ s!(</head>)!<base href="$location" />$1!i - unless ($html =~ /<BASE/i); + + # trim to directory create BASE HREF + # We are carefull to not trim if we just have http://domain.com + $location =~ s%(?<!/)/[^/]*$%/%; + + require HTML::TokeParser::Simple; + my $p = HTML::TokeParser::Simple->new(\$html) || die 'could not create HTML::TokeParser::Simple object'; +my $new_html; +while (my $token = $p->get_token) { + if ( $token->is_start_tag('head') ) { + $new_html .= $token->as_is. qq!<base href="$location" />!; + } + # If they already have a <base href>, give up + elsif ( $token->is_start_tag('base') and $token->return_attr->{href} ) { + $new_html = $html; + last; + } + else { + $new_html .= $token->as_is; + } +} my $browser; $browser = $self->browser; @@ -269,12 +288,12 @@ # We can push the HTML into a IE browser window my $document = $browser->{Document}; $document->open("text/html","replace"); - $document->write($html); + $document->write($new_html); } else { # We need to use a temp file for communication require File::Temp; my($tempfh, $tempfile) = File::Temp::tempfile(undef, UNLINK => 1); - print $tempfh $html; + print $tempfh $new_html; my $cmdline = sprintf($self->option('browsercmd'), $tempfile); system( $cmdline ) == 0 or warn "Couldn't launch '$cmdline' : $?";
From: "Max Maischein" <corion [...] corion.net>
To: <bug-WWW-Mechanize-Shell [...] rt.cpan.org>
Subject: Re: [cpan #2703] revised base href patch
Date: Sat, 7 Jun 2003 13:59:46 +0200
RT-Send-Cc:
Hi Mark, the patch still needed some more munging to make it that only a single BASE tag gets added and that it works for (admittedly broken) HTML without a HEAD or a BASE tag. The patched version is in HTML::Display v0.02 (packaged with WWW::Mechanize::Shell) - tonight the prerelease version will be published at http://www.corion.net/perl-dev for those who can't wait until the CPAN release :-) -max