Bug #130035 for XML-DOM-Lite: Patch for text documents

Subject:	Patch for text documents
Date:	Wed, 10 Jul 2019 23:05:53 +0100
To:	bug-XML-DOM-Lite [...] rt.cpan.org
From:	Peter Heslin <pheslin [...] gmail.com>

I was really delighted to find XML::DOM::Lite, which is a fantastic module. I needed a DOM manipulation library without binary dependencies, and to my surprise it's not really that much slower than LibXML in my use-case. I bumped into some problems, which are mainly because DOM::Lite seems to have been developed with data-centric XML in mind, and I am dealing with text documents. I've attached a patch, which fixes some of these issues and adds some small features which are useful for XML texts. Here's what the patch does: * It fixes a bug in the parser's handling of Processing Instruction nodes, and it fixes the serializer so that these nodes are passed through into the output. * It fixes a misfeature in the parser where whitespace was deleted even when whitespace=>'strip' was not in force (this resolves issue 60400) * It fixes a bug where setting the nodeName of an element node did not change its tagName and vice versa. * It adds two Node methods that are part of the DOM spec: removeAttribute and textContent (this resolves issue 50136). * It adds an option to the serializer (indent=>'none'), which inhibits it from adding newlines and indentation before tags. The default behaviour gives nice output for data-centric XML, but it is bad news for documents with significant whitespace between tags. Provided that the parser options to strip or normalize whitespace are not used, this new option produces output that preserves whitespace intact. * It adds a new package, XML::DOM::Lite::Extras, which adds two convenience Node methods, unbindNode and nextNonBlankSibling, which are not part of the DOM spec. Both are inspired by XML::LibXML. * It clarifies the documentation to say that getElementsByTagName produces a new list of nodes, which is not live, but that childNodes is a live list.

Message body is not shown because sender requested not to inline it.