Subject: | can we get an option to HTML::TreeBuilder to not decode entities? |
Date: | Wed, 14 Feb 2007 16:41:40 +0000 |
To: | bug-html-tree [...] rt.cpan.org |
From: | Mark Blackman <mark [...] blackmans.org> |
Hi,
As far as I can tell HTML::TreeBuilder will *always* decode HTML
entities in the _content attribute if it's not being ignored and
isn't CDATA.
992 HTML::Entities::decode($text)
993 unless $ignore_text || $is_cdata
994 || $HTML::Tagset::isCDATA_Parent{$pos->{'_tag'}};
I've got requirement to read HTML as written rather than decoded,
so an option to *not* decode like $never_decode might be appropriate.
As I believe the patch is trivial, I've not included it, but if it
helps I'm happy to submit one.
If I've misread the docs and there is some way to suspend decoding
for all text _content items then I'd be grateful for a pointer.
Cheers,
Mark Blackman