Subject: | implement support for space-preserved elements (xml:space="preserve") |
I am processing DITA documents that contain various space-preserved elements, such as this preformatted block that contains a cube with a boldfaced asterisk inside it:
<topic><body><pre>+---+
| <b>*</b> |
+---+</pre></body></topic>
When I pretty-print a DITA document with this element, XML::Twig has a good built-in heuristic that sees the leading text and suppresses the pretty-printing:
<topic>
<body>
<pre>+---+
| <b>*</b> |
+---+</pre>
</body>
</topic>
However, this heuristic is broken if the content does not begin with text, such as if the first text line begins with a tag:
<topic><body><pre><b>+---+</b>
<i>| <b>*</b> |</i>
<b>+---+</b></pre></body></topic>
In this case, the lines are pretty-printed as non-space-preserved XML:
<topic>
<body>
<pre>
<b>+---+</b>
<i>| <b>*</b> |</i>
<b>+---+</b>
</pre>
</body>
</topic>
XML::Twig should provide a per-element setting that does the following:
* Prints the entire tag as space-preserved (no indenting or reformatting), whether its content begins with text or another element.
* Exempts the element from a document-wide trim (see #125515: Provide a way to exclude tags from a twig-wide trim()).
This request seems to be analogous to the official XML mechanism provided by xml:space="preserve", so there is precedent for the behavior.
A testcase is included.
Subject: | space-preserve.pl |
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $xml = <<EOF;
<topic><body><p>Good cube (spaces are preserved):</p><pre>+---+
| <b>*</b> |
+---+</pre>
<p>Bad cube (should be space-preserved):</p><pre><b>+---+</b>
<i>| <b>*</b> |</i>
<b>+---+</b></pre></body></topic>
EOF
my $twig=XML::Twig->new();
$twig->parse($xml);
#$_->set_pretty_print('none') for $twig->root->children('pre');
$twig->print(pretty_print => 'indented');