Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 90526
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: qj1020 [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: A potential bug on parsing html doc
Date: Tue, 19 Nov 2013 07:27:33 -0800 (PST)
To: "bug-XML-LibXML [...] rt.cpan.org" <bug-XML-LibXML [...] rt.cpan.org>
From: Jin Qian <qj1020 [...] yahoo.com>
Dear maintainer,     When I ran the following code snippet, it gave me lots of errors: HTML parser error : Unexpected end tag : span pan id="turn-off-highlight-control"><span class="highlight">&nbsp;X&nbsp;</span>                                                                                 ^ HTML parser error : Unexpected end tag : a avascript:void(0)" onclick="Nabble.turnOffHighlight()">Turn off highlighting</a> ................lots of it........... My guess is that it was trying to parsing a piece of code in script section: if (Nabble.hasHighlightedTerms && !hasTurnOff) {      var turnOffLink = '<span id="turn-off-highlight-control"><span class="highlight">&nbsp;X&nbsp;</span> '; turnOffLink += '<a href="javascript:void(0)" onclick="Nabble.turnOffHighlight()">Turn off highlighting</a></span>'; $('#topics-controls-right').prepend(turnOffLink); hasTurnOff = true; } Can somone confirm and point out a solution? Thanks in advance, Jin Perl:  5.12.4 XML::LibXML:   XML::LibXML::LIBXML_VERSION=20623 ====================code snippet========================== use XML::LibXML; my $doc = XML::LibXML->load_html(location => "http://jmeter.512774.n5.nabble.com/When-to-use-the-option-quot-Retrieve-All-Embedded-Resources-from-HTML-Files-quot-td529219.html");
Subject: Re: [rt.cpan.org #90526] A potential bug on parsing html doc
Date: Tue, 19 Nov 2013 17:46:22 +0100
To: bug-XML-LibXML [...] rt.cpan.org
From: Slaven Rezic <slaven [...] rezic.de>
"Jin Qian via RT" <bug-XML-LibXML@rt.cpan.org> writes: Show quoted text
> Tue Nov 19 10:27:47 2013: Request 90526 was acted upon. > Transaction: Ticket created by qj1020@yahoo.com > Queue: XML-LibXML > Subject: A potential bug on parsing html doc > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: qj1020@yahoo.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=90526 > > > > Dear maintainer, > >     When I ran the following code snippet, it gave me lots of errors: > HTML parser error : Unexpected end tag : span > pan id="turn-off-highlight-control"><span class="highlight">&nbsp;X&nbsp;</span> >                                                                                 ^ > HTML parser error : Unexpected end tag : a > avascript:void(0)" onclick="Nabble.turnOffHighlight()">Turn off highlighting</a> > ................lots of it........... > > My guess is that it was trying to parsing a piece of code in script section: > if (Nabble.hasHighlightedTerms && !hasTurnOff) { >      var turnOffLink = '<span id="turn-off-highlight-control"><span class="highlight">&nbsp;X&nbsp;</span> '; > turnOffLink += '<a href="javascript:void(0)" onclick="Nabble.turnOffHighlight()">Turn off highlighting</a></span>'; > $('#topics-controls-right').prepend(turnOffLink); > hasTurnOff = true; > } > > Can somone confirm and point out a solution?
Hi Jin, libxml2 is somewhat loud on html content with errors. Please look at the recover, suppress_errors, and suppress_warnings configuration options. If you set everything, then parsing will be successful without any noise on the console. (Please resolve the ticket if this works for you) Regards, Slaven -- Slaven Rezic - slaven <at> rezic <dot> de Berlin Perl Mongers - http://berlin.pm.org
Subject: Re: [rt.cpan.org #90526] A potential bug on parsing html doc
Date: Tue, 19 Nov 2013 09:01:59 -0800 (PST)
To: "bug-XML-LibXML [...] rt.cpan.org" <bug-XML-LibXML [...] rt.cpan.org>
From: Jin Qian <qj1020 [...] yahoo.com>
Thanks Slaven,   With recovery option, it works fine now.  Appreciate your quick response! You can set it to be closed.   Unfortunately I can't login even after I have registered an account with www.bitcard.org. Thanks Jin ===============new code snippet (working)================ use XML::LibXML; my $doc; my $parser = XML::LibXML->new(recover => 2); doc = $parser->load_html(Location => "<URL..>"); for my $e ( $doc->findnodes('//script/@src')->get_nodelist ) { print $e->string_value(), "\n"; } On Tuesday, November 19, 2013 10:47 AM, "slaven@rezic.de via RT" <bug-XML-LibXML@rt.cpan.org> wrote: <URL: https://rt.cpan.org/Ticket/Display.html?id=90526 > "Jin Qian via RT" <bug-XML-LibXML@rt.cpan.org> writes: Show quoted text
> Tue Nov 19 10:27:47 2013: Request 90526 was acted upon. > Transaction: Ticket created by qj1020@yahoo.com >        Queue: XML-LibXML >      Subject: A potential bug on parsing html doc >    Broken in: (no value) >    Severity: (no value) >        Owner: Nobody >  Requestors: qj1020@yahoo.com >      Status: new >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=90526 > > > > Dear maintainer, > >     When I ran the following code snippet, it gave me lots of errors: > HTML parser error : Unexpected end tag : span > pan id="turn-off-highlight-control"><span class="highlight">&nbsp;X&nbsp;</span> >                                                                                 ^ > HTML parser error : Unexpected end tag : a > avascript:void(0)" onclick="Nabble.turnOffHighlight()">Turn off highlighting</a> > ................lots of it........... > > My guess is that it was trying to parsing a piece of code in script section: > if (Nabble.hasHighlightedTerms && !hasTurnOff) { >      var turnOffLink = '<span id="turn-off-highlight-control"><span class="highlight">&nbsp;X&nbsp;</span> '; > turnOffLink += '<a href="javascript:void(0)" onclick="Nabble.turnOffHighlight()">Turn off highlighting</a></span>'; > $('#topics-controls-right').prepend(turnOffLink); > hasTurnOff = true; > } > > Can somone confirm and point out a solution?
Hi Jin, libxml2 is somewhat loud on html content with errors. Please look at the recover, suppress_errors, and suppress_warnings configuration options. If you set everything, then parsing will be successful without any noise on the console. (Please resolve the ticket if this works for you) Regards,     Slaven -- Slaven Rezic - slaven <at> rezic <dot> de     Berlin Perl Mongers - http://berlin.pm.org