Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 87089
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: NGLENN [...] cpan.org
Cc: garfieldnate [...] gmail.com
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.0019
Fixed in: (no value)



CC: garfieldnate [...] gmail.com
Subject: HTML doctype differs for string/scalar input
I wanted to output HTML with an HTML5 doctype. I discovered I could do that by creating a document with that doctype already there. However, if I create an HTML document with a string pointer instead of a string, the doctype is changed to another really really long doctype: use strict; use warnings; use XML::LibXML; my $parser = XML::LibXML->new(); my $from_scalar = $parser->load_html(string => '<!DOCTYPE html><html>'); my $from_ref = $parser->load_html(string => \'<!DOCTYPE html><html>'); print $from_scalar->toStringHTML; print $from_ref->toStringHTML; Adding a quick check and dereference after line 1086 of LibXML.pm fixed the problem, but I don't know if that's desirable, since large strings might be copied. I'm sorry I don't have the expertise to delve into the XS code.
On Sat Jul 20 04:45:35 2013, NGLENN wrote: Show quoted text
> I wanted to output HTML with an HTML5 doctype. I discovered I could do > that by creating a document with that doctype already there. > However, if I create an HTML document with a string pointer instead > of a string, the doctype is changed to another really really long > doctype: > > use strict; > use warnings; > use XML::LibXML; > > my $parser = XML::LibXML->new(); > my $from_scalar = $parser->load_html(string => '<!DOCTYPE
> html><html>');
> my $from_ref = $parser->load_html(string => \'<!DOCTYPE html><html>'); > print $from_scalar->toStringHTML; > print $from_ref->toStringHTML; > > Adding a quick check and dereference after line 1086 of LibXML.pm > fixed the problem, but I don't know if that's desirable, since > large strings might be copied. I'm sorry I don't have the expertise > to delve into the XS code. >
Hi, thanks for the report. I'll try to investigate. Regards, -- Shlomi Fish
Hi! I found the cause of this bug and suggest patch to fix it. -- Yuriy
Subject: 87089.patch
diff -Nura XML-LibXML-2.0019-orig/LibXML.xs XML-LibXML-2.0019-fixed/LibXML.xs --- XML-LibXML-2.0019-orig/LibXML.xs 2012-12-04 17:37:37.000000000 +0200 +++ XML-LibXML-2.0019-fixed/LibXML.xs 2013-08-14 17:58:39.381497210 +0300 @@ -2054,6 +2054,12 @@ int recover = 0; PREINIT_SAVED_ERROR INIT: + /* If string is a reference to a string - dereference it. + * See: https://rt.cpan.org/Ticket/Display.html?id=64051 (broke it) + * https://rt.cpan.org/Ticket/Display.html?id=77864 (fixed it) */ + if (SvROK(string) && !SvOBJECT(SvRV(string))) { + string = SvRV(string); + } ptr = SvPV(string, len); if (len <= 0) { croak("Empty string\n");
Hi Yuriy, On Wed Aug 14 13:12:11 2013, yoreek wrote: Show quoted text
> Hi! > > I found the cause of this bug and suggest patch to fix it. >
thanks for the patch, but we also need to add regression tests. Can you write them or do you want me to write them? Regards, -- Shlomi Fish Show quoted text
> -- > Yuriy
On Wed Aug 14 13:36:25 2013, SHLOMIF wrote: Show quoted text
> Hi Yuriy, > > On Wed Aug 14 13:12:11 2013, yoreek wrote:
> > Hi! > > > > I found the cause of this bug and suggest patch to fix it. > >
> > thanks for the patch, but we also need to add regression tests. Can > you write them or do you want me to write them? > > Regards, > > -- Shlomi Fish >
> > -- > > Yuriy
I have tried write a test, check please it is correct. -- Yuriy
Subject: 51_parse_html_string_as_scalar_ref_rt87089.t
use strict; use warnings; =head1 DESCRIPTION Getting wrong result when parsing HTML string as a scalar reference. See L<https://rt.cpan.org/Ticket/Display.html?id=87089> . =cut use Test::More tests => 2; use XML::LibXML; my $parser = XML::LibXML->new(); # Parse HTML string as scalar { my $dom = $parser->load_html(string => '<!DOCTYPE html><html>'); is ($dom->toStringHTML, "<!DOCTYPE html>\n<html></html>\n", "Parse HTML string as scalar"); } # Parse HTML string as scalar reference { my $dom = $parser->load_html(string => \'<!DOCTYPE html><html>'); is ($dom->toStringHTML, "<!DOCTYPE html>\n<html></html>\n", "Parse HTML string as scalar reference"); }
Hi Yuriy, On Wed Aug 14 16:05:21 2013, yoreek wrote: Show quoted text
> On Wed Aug 14 13:36:25 2013, SHLOMIF wrote:
> > Hi Yuriy, > > > > On Wed Aug 14 13:12:11 2013, yoreek wrote:
> > > Hi! > > > > > > I found the cause of this bug and suggest patch to fix it. > > >
> > > > thanks for the patch, but we also need to add regression tests. Can > > you write them or do you want me to write them? > > > > Regards, > > > > -- Shlomi Fish > >
> > > -- > > > Yuriy
> > > I have tried write a test, check please it is correct. >
your test seems mostly fine, but it lacks Test::Count "# TEST" annotations and its filename is too long. See the "HACKING" file inside the distribution for more information. However, I'll apply it as it is with some tweaks. Thanks! Regards, -- Shlomi Fish
This should be fixed in XML-LibXML-2.0101 which was just uploaded to CPAN. Thanks to NGLENN for the report, and to Yuriy for the tests and fix. Regards, -- Shlomi Fish
Resolving - please file a new bug if there's a similar symtom (or reopen this bug). Regards, -- Shlomi Fish