Subject: | return plain html not xhtml |
Date: | Sun, 13 Jan 2013 10:06:23 +1100 |
To: | bug-HTML-ExtractMain [...] rt.cpan.org |
From: | Kevin Ryde <user42 [...] zip.com.au> |
It'd be good if there was an option to get back plain html rather than
xhtml. The differences are small but for example xhtml has ' which
is not a html entity (though some browsers allow it).
Perhaps something like
extract_main_html($html,
output_type => 'html');
which could default to "xhtml", and allow "html". Maybe even allow
"treebuilder" to return a crunched HTML::TreeBuilder object, which the
caller can then ask for any of its various output styles. (Or is "tree"
for a HTML::Tree object better?) Key/value for the options might help
with future expansion if for instance having to tune the main-ness of
some inputs etc.