Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 41353
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: wellnhofer [...] aevum.de
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: (no value)
Fixed in: (no value)



Subject: XML_LIBXML_KEEP_BLANKS option not documented
I just spent some hours trying to figure out why XML::LibXSLT wouldn't indent my XSLT result documents, whereas xsltproc would. It turned out to be caused by the undocumented option XML_LIBXML_KEEP_BLANKS of the XML::LibXML constructor that defaults to 1. Setting it to 0 makes the indenting work. It would be nice if this option could be documented. I also wonder if the default shouldn't be 0.
Dne po 01.pro.2008 10:42:17, NWELLNHOF napsal(a): Show quoted text
> I just spent some hours trying to figure out why XML::LibXSLT
wouldn't Show quoted text
> indent my XSLT result documents, whereas xsltproc would. It turned
out Would you care to provide a sample where xsltproc behaves differently than XML::LibXSLT? Did you have to turn it off for parsing the stylesheet or the XML input? Show quoted text
> to be caused by the undocumented option XML_LIBXML_KEEP_BLANKS of the > XML::LibXML constructor that defaults to 1. Setting it to 0 makes the > indenting work.
The option and the default value *are* documented in XML::LibXML::Parser. Show quoted text
> It would be nice if this option could be documented. > > I also wonder if the default shouldn't be 0.
I don't think so, libxml2's default is 1, XML standard says whitespace is significant. -- Petr
I see, the option is documented. Sorry about that. But I had a closer look into the indenting issue and the behavior is definitely buggy. Just see the attached test case. Test 1 is a simple transform using transform_file(). Indenting works as expected. Test 2 is the same transform. But before that I parse a completely unrelated XML file. Suddenly, the indenting doesn't work. Test 3 is also the same transform. Before that I parse an unrelated XML file, but this time with keep_blanks(0). Now, the indenting works again. There seems to be a strange side effect. I think this only affects XHTML output. Maybe I dig deeper into this over the weekend. I should be familiar enough with the libxml/libxslt code base to fix this.

Message body not shown because it is not plain text.

It's a bug in libxml. See my post to the libxml mailing list. Maybe one could work around this in XML::LibXML, but I don't think it's very important.
Dne so 24.led.2009 08:19:20, NWELLNHOF napsal(a): Show quoted text
> It's a bug in libxml. See my post to the libxml mailing list. > > Maybe one could work around this in XML::LibXML, but I don't think it's > very important.
The current SVN does not call xmlKeepBlanksDefault() at all, which fixes this as well. Thanks, -- Petr
Why do you change REPORT_ERROR(recover ? recover : 1); to REPORT_ERROR(1); ? recover can be also 2... -- Petr Dne st 30.zář.2009 10:00:30, PAJAS napsal(a): Show quoted text
> Dne so 24.led.2009 08:19:20, NWELLNHOF napsal(a):
> > It's a bug in libxml. See my post to the libxml mailing list. > > > > Maybe one could work around this in XML::LibXML, but I don't think it's > > very important.
> > The current SVN does not call xmlKeepBlanksDefault() at all, which fixes > this as well. > > Thanks, > > -- Petr
Subject: Re: [rt.cpan.org #41353] XML_LIBXML_KEEP_BLANKS option not documented
Date: Thu, 01 Oct 2009 15:00:11 +0200
To: bug-XML-LibXML [...] rt.cpan.org
From: Nick Wellnhofer <wellnhofer [...] aevum.de>
Petr Pajas via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=41353 > > > Why do you change > > REPORT_ERROR(recover ? recover : 1); > to > REPORT_ERROR(1); ? > > recover can be also 2...
At this point recover is always 0 because LibXML_get_recover hasn't been called yet. If the parser context can't be created it is an internal error and the code dies anyway. So it doesn't make much sense to use the recover value. Nick
Hallo Nick, thanks for the explanation (for the record: parts of the conversation went by private mail). Revision 807 contains most of the cleanup patch you sent via mail. I'm attaching your original patch so that it is recorded here as well, but it is not what I comitted. Most notably, I kept #if LIBXML_VERSION > 20600 /* dictionaries not support yet */ ctxt->dictNames = 0; #endif (don't know why you removed this; since we only support libxml2-2.6.x, I would understand removing the #if, though) and if ( recover || ( well_formed && ( !xmlDoValidityCheckingDefaultValue || ( valid || ( real_doc->intSubset == NULL && real_doc->extSubset == NULL ))))) ... where xmlDoValidityCheckingDefaultValue is now replaced by the value of ctxt->validate. Note that there is no need to NULL real_doc in the else clause (but in case somebody modifies the subsequent code and forgets about this I decided to NULL it anyway - committed in revision 808). I also dealt with the suppressed warnings in a different manner. For now I'm closing this ticket, but you are welcome to reopen it in order to provide feedback. Thanks for your help, -- Petr
Index: LibXML.xs =================================================================== --- LibXML.xs (revision 803) +++ LibXML.xs (working copy) @@ -225,6 +225,9 @@ const char * CLASS = "XML::LibXML::LibError"; SV* libErr; + /* TODO: warning should be emitted */ + if ( error->level == XML_ERR_WARNING ) return; + libErr = NEWSV(0,0); sv_setref_pv( libErr, CLASS, (void*)error ); LibXML_struct_error_callback( saved_error, libErr); @@ -861,8 +864,6 @@ /* calling xmlInitParser() here is definitly wrong! */ /* xmlInitParser(); */ - xmlGetWarningsDefaultValue = 0; - if ( self != NULL ) { /* first fetch the values from the hash */ real_obj = (HV *)SvRV(self); @@ -882,11 +883,9 @@ item = hv_fetch( real_obj, "XML_LIBXML_LINENUMBERS", 22, 0 ); if ( item != NULL && SvTRUE(*item) ) { if (ctxt) ctxt->linenumbers = 1; - else xmlLineNumbersDefault( 1 ); } else { if (ctxt) ctxt->linenumbers = 0; - else xmlLineNumbersDefault( 0 ); } item = hv_fetch(real_obj, "ext_ent_handler", 15, 0); @@ -908,14 +907,6 @@ void LibXML_cleanup_parser() { - xmlSubstituteEntitiesDefaultValue = 1; - xmlKeepBlanksDefaultValue = 1; - xmlGetWarningsDefaultValue = 0; - xmlLoadExtDtdDefaultValue = 5; - xmlPedanticParserDefaultValue = 0; - xmlLineNumbersDefault( 0 ); - xmlDoValidityCheckingDefaultValue = 0; - if (LibXML_old_ext_ent_loader != NULL ) { xmlSetExternalEntityLoader( (xmlExternalEntityLoader)LibXML_old_ext_ent_loader ); } @@ -1404,17 +1395,6 @@ LIBXML_TEST_VERSION xmlInitParser(); PmmSAXInitialize(aTHX); - - xmlSetGenericErrorFunc( NULL , - (xmlGenericErrorFunc)LibXML_error_handler_ctx); - xmlDoValidityCheckingDefaultValue = 0; - xmlSubstituteEntitiesDefaultValue = 1; - xmlGetWarningsDefaultValue = 0; - xmlKeepBlanksDefaultValue = 1; - xmlLoadExtDtdDefaultValue = 5; - xmlPedanticParserDefaultValue = 0; - xmlLineNumbersDefault(0); - xmlSetGenericErrorFunc(NULL, NULL); #ifdef LIBXML_CATALOG_ENABLED /* xmlCatalogSetDebug(10); */ xmlInitializeCatalog(); /* use catalog data */ @@ -1555,8 +1535,6 @@ STRLEN len; char * ptr; HV * real_obj; - int well_formed; - int valid; xmlDocPtr real_doc; int recover = 0; PREINIT_SAVED_ERROR @@ -1579,7 +1557,7 @@ xmlParserCtxtPtr ctxt = xmlCreateMemoryParserCtxt((const char*)ptr, len); if (ctxt == NULL) { CLEANUP_ERROR_HANDLER; - REPORT_ERROR(recover ? recover : 1); + REPORT_ERROR(1); croak("Could not create memory parser context!\n"); } xs_warn( "context created\n"); @@ -1603,15 +1581,10 @@ xs_warn( "context initialized\n" ); - { - /* TODO: make this into a macro: */ - xmlParseDocument(ctxt); - xs_warn( "document parsed \n"); - } + xmlParseDocument(ctxt); + xs_warn( "document parsed \n"); ctxt->directory = NULL; - well_formed = ctxt->wellFormed; - valid = ctxt->valid; real_doc = ctxt->myDoc; ctxt->myDoc = NULL; xmlFreeParserCtxt(ctxt); @@ -1628,15 +1601,8 @@ } else { real_doc->URL = xmlStrdup((const xmlChar*)directory); } - if ( ! LibXML_will_die_ctx(saved_error, recover) && - (recover || ( well_formed && - ( !xmlDoValidityCheckingDefaultValue - || ( valid || ( real_doc->intSubset == NULL - && real_doc->extSubset == NULL )))))) { - RETVAL = LibXML_NodeToSv( real_obj, INT2PTR(xmlNodePtr,real_doc) ); - } else { - xmlFreeDoc(real_doc); - } + + RETVAL = LibXML_NodeToSv( real_obj, INT2PTR(xmlNodePtr,real_doc) ); } LibXML_cleanup_parser(); @@ -1702,8 +1668,6 @@ STRLEN len; char * directory = NULL; HV * real_obj; - int well_formed; - int valid; xmlDocPtr real_doc; int recover = 0; PREINIT_SAVED_ERROR @@ -1732,16 +1696,12 @@ ctxt = xmlCreatePushParserCtxt(NULL, NULL, buffer, read_length, NULL); if (ctxt == NULL) { CLEANUP_ERROR_HANDLER; - REPORT_ERROR(recover ? recover : 1); + REPORT_ERROR(1); croak("Could not create xml push parser context!\n"); } xs_warn( "context created\n"); real_obj = LibXML_init_parser(self, ctxt); recover = LibXML_get_recover(real_obj); -#if LIBXML_VERSION > 20600 - /* dictionaries not support yet */ - ctxt->dictNames = 0; -#endif if ( directory != NULL ) { ctxt->directory = directory; } @@ -1760,8 +1720,6 @@ } ctxt->directory = NULL; - well_formed = ctxt->wellFormed; - valid = ctxt->valid; real_doc = ctxt->myDoc; ctxt->myDoc = NULL; xmlFreeParserCtxt(ctxt); @@ -1776,14 +1734,7 @@ real_doc->URL = xmlStrdup((const xmlChar*)directory); } - if ( recover || ( well_formed && - ( !xmlDoValidityCheckingDefaultValue - || ( valid || ( real_doc->intSubset == NULL - && real_doc->extSubset == NULL ))))) { - RETVAL = LibXML_NodeToSv( real_obj, INT2PTR(xmlNodePtr,real_doc) ); - } else { - xmlFreeDoc(real_doc); - } + RETVAL = LibXML_NodeToSv( real_obj, INT2PTR(xmlNodePtr,real_doc) ); } LibXML_cleanup_parser(); @@ -1872,8 +1823,6 @@ STRLEN len; char * filename; HV * real_obj; - int well_formed; - int valid; xmlDocPtr real_doc; int recover = 0; PREINIT_SAVED_ERROR @@ -1891,7 +1840,7 @@ xmlParserCtxtPtr ctxt = xmlCreateFileParserCtxt(filename); if (ctxt == NULL) { CLEANUP_ERROR_HANDLER; - REPORT_ERROR(recover ? recover : 1); + REPORT_ERROR(1); croak("Could not create file parser context for file \"%s\": %s\n", filename, strerror(errno)); } @@ -1902,28 +1851,17 @@ ctxt->_private = (void*)self; xs_warn( "context initialized\n" ); - { - xmlParseDocument(ctxt); - xs_warn( "document parsed \n"); - } - well_formed = ctxt->wellFormed; - valid = ctxt->valid; + xmlParseDocument(ctxt); + xs_warn( "document parsed \n"); + real_doc = ctxt->myDoc; ctxt->myDoc = NULL; xmlFreeParserCtxt(ctxt); } if ( real_doc != NULL ) { - - if ( recover || ( well_formed && - ( !xmlDoValidityCheckingDefaultValue - || ( valid || ( real_doc->intSubset == NULL - && real_doc->extSubset == NULL ))))) { - RETVAL = LibXML_NodeToSv( real_obj, INT2PTR(xmlNodePtr,real_doc) ); - } else { - xmlFreeDoc(real_doc); - } + RETVAL = LibXML_NodeToSv( real_obj, INT2PTR(xmlNodePtr,real_doc) ); } LibXML_cleanup_parser();
Subject: Re: [rt.cpan.org #41353] XML_LIBXML_KEEP_BLANKS option not documented
Date: Mon, 05 Oct 2009 21:53:52 +0200
To: bug-XML-LibXML [...] rt.cpan.org
From: Nick Wellnhofer <wellnhofer [...] aevum.de>
Petr Pajas via RT wrote: Show quoted text
> Revision 807 contains most of the cleanup patch you sent via mail. I'm > attaching your original patch so that it is recorded here as well, but > it is not what I comitted. Most notably, I kept > > #if LIBXML_VERSION > 20600 > /* dictionaries not support yet */ > ctxt->dictNames = 0; > #endif
The parser options always include XML_PARSE_NODICT now, so this isn't necessary. Show quoted text
> (don't know why you removed this; since we only support libxml2-2.6.x, I > would understand removing the #if, though) > > and > > if ( recover || ( well_formed && > ( !xmlDoValidityCheckingDefaultValue > || ( valid || ( real_doc->intSubset == NULL > && real_doc->extSubset == NULL ))))) > ... > > where xmlDoValidityCheckingDefaultValue is now replaced by the value of > ctxt->validate.
I think the whole if/else is unnecessary because libxml does all that checking. The condition should always be true if we get a non-null real_doc. If recover is false we should always get a well formed document. If validity checking is enabled the document should always be valid. Otherwise real_doc returned from libxml is NULL. But I'm not 100% sure about this. When I remove the check the test cases still work. But I don't know if there are good test cases for not well formed and invalid documents. Also note that there is an additional check for LibXML_will_die_ctx in _parse_string which is missing from _parse_file and _parse_fh. At least, these three functions should be consistent. Show quoted text
> Note that there is no need to NULL real_doc in the else > clause
Yes, I was wrong about that. There's no need to null real_doc. Show quoted text
> (but in case somebody modifies the subsequent code and forgets > about this I decided to NULL it anyway - committed in revision 808). > > I also dealt with the suppressed warnings in a different manner.
Looks good. Personally, I would set $XML::LibXML::Error::WARNINGS to 1 by default although this wouldn't match the old behavior. Show quoted text
> For now I'm closing this ticket, but you are welcome to reopen it in > order to provide feedback.
The original bug is definitely resolved. There's also a related fix in libxml 2.7.4. Nick -- aevum gmbh rumfordstr. 4 80469 münchen germany tel: +49 89 3838 0653 http://aevum.de/
Thanks for the explanations, Nick. You are probably right in all points but since this may require adding more tests and checking againts all the supported libxml2 versions I decided to keep the code there for the 1.70 release, but will probably remove it in some future version. As you suggested, I made the error handling code consistent among all _parse_* functions. -- Petr Dne po 05.říj.2009 15:56:29, wellnhofer@aevum.de napsal(a): Show quoted text
> Petr Pajas via RT wrote:
> > Revision 807 contains most of the cleanup patch you sent via mail. I'm > > attaching your original patch so that it is recorded here as well, but > > it is not what I comitted. Most notably, I kept > > > > #if LIBXML_VERSION > 20600 > > /* dictionaries not support yet */ > > ctxt->dictNames = 0; > > #endif
> > The parser options always include XML_PARSE_NODICT now, so this isn't > necessary. >
> > (don't know why you removed this; since we only support libxml2-2.6.x, I > > would understand removing the #if, though) > > > > and > > > > if ( recover || ( well_formed && > > ( !xmlDoValidityCheckingDefaultValue > > || ( valid || ( real_doc->intSubset == NULL > > && real_doc->extSubset == NULL ))))) > > ... > > > > where xmlDoValidityCheckingDefaultValue is now replaced by the value of > > ctxt->validate.
> > I think the whole if/else is unnecessary because libxml does all that > checking. The condition should always be true if we get a non-null > real_doc. If recover is false we should always get a well formed > document. If validity checking is enabled the document should always be > valid. Otherwise real_doc returned from libxml is NULL. But I'm not 100% > sure about this. > > When I remove the check the test cases still work. But I don't know if > there are good test cases for not well formed and invalid documents. > > Also note that there is an additional check for LibXML_will_die_ctx in > _parse_string which is missing from _parse_file and _parse_fh. At least, > these three functions should be consistent. >
> > Note that there is no need to NULL real_doc in the else > > clause
> > Yes, I was wrong about that. There's no need to null real_doc. >
> > (but in case somebody modifies the subsequent code and forgets > > about this I decided to NULL it anyway - committed in revision 808). > > > > I also dealt with the suppressed warnings in a different manner.
> > Looks good. Personally, I would set $XML::LibXML::Error::WARNINGS to 1 > by default although this wouldn't match the old behavior. >
> > For now I'm closing this ticket, but you are welcome to reopen it in > > order to provide feedback.
> > The original bug is definitely resolved. There's also a related fix in > libxml 2.7.4. > > Nick > >