Skip Menu |

This queue is for tickets about the HTML-Zoom CPAN distribution.

Report information
The Basics
Id: 73470
Status: resolved
Priority: 0/
Queue: HTML-Zoom

People
Owner: cpan [...] papercreatures.com
Requestors: m-rt.cpan.org-98jw3v [...] lexoid.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: jfm+filecopy [...] lexoid.com
Subject: Bug in built-in HTML parser -- loses text
Date: Sat, 24 Dec 2011 13:47:44 -0600
To: bug-HTML-Zoom [...] rt.cpan.org
From: Jim Miner <m-rt.cpan.org-98jw3v [...] lexoid.com>
Bug report for HTML-Zoom-0.009006 / perl v5.8.8 / (OSX 10.5 & RHEL 4) The built-in HTML parser causes input text to be lost by HTML::Zoom under some conditions. - The entire input is lost if the input contains no tag. - Leading text (before the first tag) is lost if the first tag is modified, e.g., by replace_attribute. This is a problem when, e.g., replacing content with an HTML fragment. Below find: - script demonstrating the bug. - output of the script. - patch. Show quoted text
-------------- script -------------- #!/usr/bin/perl use strictures 1; use HTML::Zoom; my @data = ( 'text', 'text<tag a="x">', ); print "--- pass-through ---\n"; foreach my $in ( @data ) { my $out = HTML::Zoom->from_html($in)->to_html; print "in: $in\n", "out: $out\n", "\n"; } print "--- remove_attribute('a') ---\n"; foreach my $in ( @data ) { my $z = HTML::Zoom->from_html($in); $z = $z->select('tag')->remove_attribute('a'); my $out = $z->to_html; print "in: $in\n", "out: $out\n", "\n"; }
-------------- script output -------------- --- pass-through --- in: text out: in: text<tag a="x"> out: text<tag a="x"> --- remove_attribute('a') --- in: text out: in: text<tag a="x"> out: <tag>
-------------- patch -------------- *** HTML-Zoom-0.009006/lib/HTML/Zoom/Parser/BuiltIn.pm 2011-03-27 09:23:14.000000000 -0500 --- HTML-Zoom-0.009006/lib/HTML/Zoom/Parser/BuiltIn-PATCHED.pm 2011-12-16 00:26:18.000000000 -0600 *************** *** 18,23 **** --- 18,27 ---- sub _hacky_tag_parser { my ($text, $handler) = @_; + $text =~ m{^([^<]*)}g; + if ( length $1 ) { # leading PCDATA + $handler->({ type => 'TEXT', raw => $1 }); + } while ( $text =~ m{ ( *************** *** 109,111 **** --- 113,122 ---- sub html_unescape { _simple_unescape($_[1]) } 1; + + __END__ + + Modification 2011-12-15 by Jim Miner + Don't throw away leading PCDATA in $text, in _hacky_tag_parser(). + This is important so we can use from_html and replace_content to + insert fragments with or without markup into templates.