Date: | Fri, 2 Apr 2004 16:00:14 -0500 |
From: | Todd Chapman <todd [...] chaka.net> |
To: | bug-www-mechanize [...] rt.cpan.org |
Subject: | Feature Request: Allow modification and reparsing of HTML |
Sometimes we know an evil website is going to return bad
HTML, specifically missing end tags. It would be nice
if I didn't have to touch WWW::Mechanize internals to
make things work. This is what I am currently doing and
it seems to work:
$mech->get( "http://myweb.com/alms/AdminEmpDir.asp" ) or die $!;
my $html = $mech->content;
$html =~ s/<\/option>.?.?.?<\/td>/<\/option><\/select><\/td>/isg;
$mech->{content} = $html;
$mech->_parse_html;
There must be a better way...
-Todd Chapman
----- Forwarded message from Andy Lester <andy@petdance.com> -----
Date: Thu, 1 Apr 2004 11:02:53 -0600
From: Andy Lester <andy@petdance.com>
To: Todd Chapman <todd@chaka.net>
Subject: Re: Mechanize question
In-Reply-To: <20040401160849.GG15337@chaka.net>
User-Agent: Mutt/1.4i
X-Spam-Status: No, hits=-1.5 required=3.0 tests=BAYES_01 autolearn=ham
version=2.63
Local-Archive: catchall/catchall-Apr-2004
Show quoted text
> The dump prints the original HTML even though
> $mech->content prints the new HTML. How can I
> force a re-parse of the modified HTML?
I don't have a way to do it, but maybe I should. Submit it as a request
to bug-www-mechanize@rt.cpan.org.
For now, try calling $mech->_parse_html() after you fudge the content,
and let me know what happens. Better yet, if it DOES work, put that in
the request, and I'll see if it makes sense to have a formal method to
let you do that.
xoa
--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance
Show quoted text----- End forwarded message -----