Subject: | errors on processing HTML with unknown tags |
Date: | Tue, 15 Apr 2014 17:59:04 +0200 |
To: | bug-HTML-HTML5-ToText [...] rt.cpan.org |
From: | Christian Loos <cloos [...] netcologne.de> |
Hi,
if HTML contains tags which are not defined in the begin block [1] you
get errors like this:
Can't locate object method "O:P" via package
"MooseX::Traits::__ANON__::SERIAL::1" at
/usr/local/share/perl/5.14.2/HTML/HTML5/ToText.pm line 108.
I know, it is not your fault if I try to process invalid HTML.
But this module should maybe expect unknown tags and just ignore them.
In my case the invalid HTML is generated by Microsoft Outlook 2003 if
you have the "use Word for E-Mail writing" option enabled.
Microsoft seams to find it funny to include <o:p> tags in the HTML
source, which in my test cases are just empty.
Chris
[1]
https://metacpan.org/source/TOBYINK/HTML-HTML5-ToText-0.004/lib/HTML/HTML5/ToText.pm#L22