Subject: | Patch for text documents |
Date: | Wed, 10 Jul 2019 23:05:53 +0100 |
To: | bug-XML-DOM-Lite [...] rt.cpan.org |
From: | Peter Heslin <pheslin [...] gmail.com> |
I was really delighted to find XML::DOM::Lite, which is a fantastic
module. I needed a DOM manipulation library without binary
dependencies, and to my surprise it's not really that much slower than
LibXML in my use-case.
I bumped into some problems, which are mainly because DOM::Lite seems
to have been developed with data-centric XML in mind, and I am dealing
with text documents. I've attached a patch, which fixes some of these
issues and adds some small features which are useful for XML texts.
Here's what the patch does:
* It fixes a bug in the parser's handling of Processing Instruction
nodes, and it fixes the serializer so that these nodes are passed
through into the output.
* It fixes a misfeature in the parser where whitespace was deleted
even when whitespace=>'strip' was not in force (this resolves issue
60400)
* It fixes a bug where setting the nodeName of an element node did not
change its tagName and vice versa.
* It adds two Node methods that are part of the DOM spec:
removeAttribute and textContent (this resolves issue 50136).
* It adds an option to the serializer (indent=>'none'), which inhibits
it from adding newlines and indentation before tags. The default
behaviour gives nice output for data-centric XML, but it is bad news
for documents with significant whitespace between tags. Provided
that the parser options to strip or normalize whitespace are not
used, this new option produces output that preserves whitespace
intact.
* It adds a new package, XML::DOM::Lite::Extras, which adds two
convenience Node methods, unbindNode and nextNonBlankSibling, which
are not part of the DOM spec. Both are inspired by XML::LibXML.
* It clarifies the documentation to say that getElementsByTagName
produces a new list of nodes, which is not live, but that childNodes
is a live list.
Message body is not shown because sender requested not to inline it.