VVELOX writes:
Show quoted text> I am confused as to why I have this ticket. AFAIK KENTNL is
> responsible for this module.
Yes, this should've been raised in the HTML-Tree queue, rather than the
Tree-Builder queue. I don't know who has permission to change that. (I
don't; maybe you do, as maintainer of the distribution whose queue it's
currently in?)
MJD writes:
Show quoted text> Where did my SOURCE element go?
Discarded as not a valid HTML4 element, it seems.
HTML::TreeBuilder delegates determining what is a valid HTML element to
HTML::Tagset:
https://rt.cpan.org/Ticket/Display.html?id=84526#txn-1659371
And the maintainer of HTML::Tagset has decreed that HTML::Tagset needs
to continue to reject new HTML5 elements as invalid, because that's what
they were in HTML4, and existing users of the module may be relying on
those tags being invalid, and so skipped:
https://rt.cpan.org/Ticket/Display.html?id=67299#txn-1725341
However, if you don't need the ‘ignore unknown tags’ feature and are
happy for the tree to contain all the elements that you pass to it, then
you can get the <source> element by de-activating the ignore_unknown
option:
% perl -MHTML::TreeBuilder -e '$z = q{<source src="/media/horseshoe-curve-small.mp4" type="video/mp4"/>};
HTML::TreeBuilder->new(ignore_unknown => 0)->parse($z)->eof->elementify()->dump(\*STDERR)'
Yields the desired:
<html> @0 (IMPLICIT)
<head> @0.0 (IMPLICIT)
<body> @0.1 (IMPLICIT)
<source src="/media/horseshoe-curve-small.mp4" type="video/mp4"> @0.2
(Passing options to ->new like that isn't documented, but has worked
since at least 1996:
https://metacpan.org/source/GAAS/HTML-Tree-0.50/lib/HTML/TreeBuilder.pm#L134
The docs were patched to acknowledge that behaviour in 2012, for
upcoming version 5.9:
https://github.com/kentfredric/HTML-Tree/commit/2f2fabb8ce1dbcef416be06d5ed5734c9da4944b
5.9 hasn't been released yet, but now there's intent to document this
useful and long-standing behaviour, it's probably safe to rely on it.)
Smylers