Skip Menu |

This queue is for tickets about the Tree-Builder CPAN distribution.

Report information
The Basics
Id: 127503
Status: open
Priority: 0/
Queue: Tree-Builder

People
Owner: Nobody in particular
Requestors: MJD [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: HTML::TreeBuilder discards SOURCE element
% perl -MHTML::TreeBuilder -e '$z = q{<source src="/media/horseshoe-curve-small.mp4" type="video/mp4"/>}; HTML::TreeBuilder->new->parse($z)->eof->elementify()->dump(\*STDERR)' This produces: <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) Where did my SOURCE element go? HTML::TreeBuilder->VERSION says 5.07. Perl is 5.26.1.
I am confused as to why I have this ticket. AFAIK KENTNL is responsible for this module. On Sun Oct 28 14:42:54 2018, MJD wrote: Show quoted text
> % perl -MHTML::TreeBuilder -e '$z = q{<source src="/media/horseshoe- > curve-small.mp4" type="video/mp4"/>}; HTML::TreeBuilder->new-
> >parse($z)->eof->elementify()->dump(\*STDERR)'
> > This produces: > > <html> @0 (IMPLICIT) > <head> @0.0 (IMPLICIT) > <body> @0.1 (IMPLICIT) > > Where did my SOURCE element go? > > HTML::TreeBuilder->VERSION says 5.07. > Perl is 5.26.1.
VVELOX writes: Show quoted text
> I am confused as to why I have this ticket. AFAIK KENTNL is > responsible for this module.
Yes, this should've been raised in the HTML-Tree queue, rather than the Tree-Builder queue. I don't know who has permission to change that. (I don't; maybe you do, as maintainer of the distribution whose queue it's currently in?) MJD writes: Show quoted text
> Where did my SOURCE element go?
Discarded as not a valid HTML4 element, it seems. HTML::TreeBuilder delegates determining what is a valid HTML element to HTML::Tagset: https://rt.cpan.org/Ticket/Display.html?id=84526#txn-1659371 And the maintainer of HTML::Tagset has decreed that HTML::Tagset needs to continue to reject new HTML5 elements as invalid, because that's what they were in HTML4, and existing users of the module may be relying on those tags being invalid, and so skipped: https://rt.cpan.org/Ticket/Display.html?id=67299#txn-1725341 However, if you don't need the ‘ignore unknown tags’ feature and are happy for the tree to contain all the elements that you pass to it, then you can get the <source> element by de-activating the ignore_unknown option: % perl -MHTML::TreeBuilder -e '$z = q{<source src="/media/horseshoe-curve-small.mp4" type="video/mp4"/>}; HTML::TreeBuilder->new(ignore_unknown => 0)->parse($z)->eof->elementify()->dump(\*STDERR)' Yields the desired: <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) <source src="/media/horseshoe-curve-small.mp4" type="video/mp4"> @0.2 (Passing options to ->new like that isn't documented, but has worked since at least 1996: https://metacpan.org/source/GAAS/HTML-Tree-0.50/lib/HTML/TreeBuilder.pm#L134 The docs were patched to acknowledge that behaviour in 2012, for upcoming version 5.9: https://github.com/kentfredric/HTML-Tree/commit/2f2fabb8ce1dbcef416be06d5ed5734c9da4944b 5.9 hasn't been released yet, but now there's intent to document this useful and long-standing behaviour, it's probably safe to rely on it.) Smylers