Skip Menu |

This queue is for tickets about the HTML-TreeBuilder-XPath CPAN distribution.

Report information
The Basics
Id: 90164
Status: open
Priority: 0/
Queue: HTML-TreeBuilder-XPath

People
Owner: Nobody in particular
Requestors: ambrus [...] math.bme.hu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: as_XML_indented omits contents of script tag
Date: Fri, 8 Nov 2013 22:27:40 +0100
To: bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From: Zsbán Ambrus <ambrus [...] math.bme.hu>
Hi Mirod. It appears that the as_XML_indented method of HTML::TreeBuilder::XPath can omit the text contents of a script tag. This happens when you parse a HTML document that uses a plain script tag with javascript contents as text inline (not commented). Below I'll show a perl command that generates this problem for me. See the version numbers of modules I used below. Ambrus $ perl -we 'use warnings; use 5.016; use HTML::TreeBuilder::XPath; my $tb = HTML::TreeBuilder::XPath->new; $tb->parse(qq(<script>\nfunction cold() {}\n</script><p>hot)); $tb->eof; say $tb->as_XML; say $tb->as_XML_compact; say $tb->as_XML_indented; for (qw(HTML::TreeBuilder::XPath HTML::TreeBuilder XML::XPathEngine HTML::Entities HTML::Tagset)) { say "$_ ",$_->VERSION; }' <html><head><script> function cold() {} </script></head><body><p>hot</p></body></html> <html><head><script> function cold() {} </script></head><body><p>hot</p></body></html> <html> <head> <script></script> </head> <body> <p>hot</p> </body> </html> HTML::TreeBuilder::XPath 0.14 HTML::TreeBuilder 5.02 XML::XPathEngine 0.13 HTML::Entities 3.69 HTML::Tagset 3.20 $ perl -V Summary of my perl5 (revision 5 version 16 subversion 3) configuration: Platform: osname=linux, osvers=2.6.37, archname=x86_64-linux uname='linux king 2.6.37 #6 smp sun mar 13 20:15:05 cet 2011 x86_64 gnulinux ' config_args='-der' hint=previous, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64', optimize='-O2', cppflags='-fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64' ccversion='', gccversion='4.7.1', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -fstack-protector -L/usr/local/lib' libpth=/usr/local/lib /lib/../lib /usr/lib/../lib /lib /usr/lib /lib64 /usr/lib64 /usr/local/lib64 libs=-lnsl -ldl -lm -lcrypt -lutil -lc perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc libc=/lib/libc-2.11.3.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='2.11.3' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E' cccdlflags='-fPIC', lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector' Characteristics of this binary (from libperl): Compile-time options: HAS_TIMES PERLIO_LAYERS PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP PERL_PRESERVE_IVUV USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_PERLIO USE_PERL_ATOF Built under linux Compiled at May 12 2013 14:39:37 @INC: /usr/local/perl5.16/lib/perl5/site_perl/5.16.3/x86_64-linux /usr/local/perl5.16/lib/perl5/site_perl/5.16.3 /usr/local/perl5.16/lib/perl5/5.16.3/x86_64-linux /usr/local/perl5.16/lib/perl5/5.16.3 /usr/local/perl5.16/lib/perl5/site_perl/5.16.1 /usr/local/perl5.16/lib/perl5/site_perl/5.16.1/x86_64-linux /usr/local/perl5.16/lib/perl5/site_perl . $
Subject: [rt.cpan.org #90164]
Date: Tue, 10 Dec 2013 17:47:15 +0100
To: bug-HTML-TreeBuilder-XPath [...] rt.cpan.org
From: Christophe Staïesse <chastai [...] skynet.be>
Hi, Here is a patch to fix this bug: --- a/lib/HTML/TreeBuilder/XPath.pm +++ b/lib/HTML/TreeBuilder/XPath.pm @@ -288,7 +288,7 @@ sub toString { return shift->as_XML( @_); } my $content=''; if( $HTML::Tagset::isCDATA_Parent{$lc_name}) - { my $content= $node->{_content} || ''; + { $content= $node->{_content} || ''; if( ref $content && (ref $content eq 'ARRAY' || $content->isa( 'ARRAY') )) { $content= _xml_escape_cdata( join( '', @$content), $opt); } } In sub as_XML_indented, $content was redefined inside an if block and so its value was ignored by the rest of the program. To detect this kind of problem use perlcritic -s Variables::ProhibitReusedNames lib/HTML/TreeBuilder/XPath.pm Regression test: use Test::More tests => 1; use HTML::TreeBuilder::XPath; my $html= HTML::TreeBuilder::XPath->new_from_content( '<script>content</script>'); like($html->as_XML_indented, qr{<scrip>\s*content\s*</script>}, 'test'); Christophe.