CC: | mst [...] shadowcat.co.uk |
Subject: | (PATCH) HTML-Zoom with plain-text |
$zoom->from_html("plain text")->to_html;# returns ""
But I (personally) would expect it to return "plain text".
Using HTML::Zoom::Parser::HTML::Parser it returns "plain text" as expected.
Find attached a patch and test file. Identical test file was submitted to
HTML::Zoom::Parser::HTML::Parser ( https://github.com/mphill22/HTML-Zoom-Parser-HTML-
Parser/pull/1 )
I was afraid (or too lazy?) to touch the regexp in the parser, so this patch is kind of brutish.
There might be a more suitable potential solution.
Subject: | html_zoom_plaintextfix.patch |
diff -crB HTML-Zoom-0.009006/lib/HTML/Zoom/Parser/BuiltIn.pm HTML-Zoom-PlainTextFix/lib/HTML/Zoom/Parser/BuiltIn.pm
*** HTML-Zoom-0.009006/lib/HTML/Zoom/Parser/BuiltIn.pm 2011-03-27 09:23:14.000000000 -0500
--- HTML-Zoom-PlainTextFix/lib/HTML/Zoom/Parser/BuiltIn.pm 2012-02-13 13:18:46.000000000 -0600
***************
*** 18,36 ****
sub _hacky_tag_parser {
my ($text, $handler) = @_;
! while (
! $text =~ m{
! (
! (?:[^<]*) < (?:
! ( / )? ( [^/!<>\s"'=]+ )
! ( (?:"[^"]*"|'[^']*'|[^/"'<>])+? )?
! |
! (!-- .*? -- | ![^\-] .*? )
! ) (\s*/\s*)? >
! )
! ([^<]*)
! }sxg
! ) {
my ($whole, $is_close, $tag_name, $attributes, $is_special,
$in_place_close, $content)
= ($1, $2, $3, $4, $5, $6, $7, $8);
--- 18,35 ----
sub _hacky_tag_parser {
my ($text, $handler) = @_;
! my $tag_match = qr{
! (
! (?:[^<]*) < (?:
! ( / )? ( [^/!<>\s"'=]+ )
! ( (?:"[^"]*"|'[^']*'|[^/"'<>])+? )?
! |
! (!-- .*? -- | ![^\-] .*? )
! ) (\s*/\s*)? >
! )
! ([^<]*)
! }sx;
! while ($text =~ /$tag_match/g) {
my ($whole, $is_close, $tag_name, $attributes, $is_special,
$in_place_close, $content)
= ($1, $2, $3, $4, $5, $6, $7, $8);
***************
*** 62,67 ****
--- 61,70 ----
$handler->({ type => 'TEXT', raw => $content });
}
}
+ # Special case where you have plain-text (e.g. $text is 'Hello world')
+ if ($text !~ $tag_match && length $text) {
+ $handler->({ type => 'TEXT', raw => $text });
+ }
}
sub _hacky_attribute_parser {
Only in HTML-Zoom-PlainTextFix/t: plain_text.t
Subject: | plain_text.t |
use strictures 1;
use Test::More qw(no_plan);
use HTML::Zoom;
my $zoom = HTML::Zoom->new;
my $plain_text = 'Hello, World!';
is($zoom->from_html($plain_text)->to_html, $plain_text, 'Parser preserves plain-text input');