Subject: | CDATA are no child nodes |
Dear Ronan,
I have noticed that SVG does not treat PCDATA sections (e.g. inside a
text elements) as child nodes. This leads to several problems.
1) According to the DOM (e.g. http://www.w3.org/TR/DOM-Level-3-
you do for example
my $svg = SVG->new(width => 200, height => 200);
my $text = $svg->text('x' => 20, 'y' => 100, id => 't01');
$text->tspan->cdata('foo');
$text->cdata('bar');
then $text->getChildNodes should give you the tspan and the cdata
nodes. XML parsers like libxml2 do this. If you feed the SVG document
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="200" height="200" xmlns="http://www.w3.org/2000/svg"
version="1.1">
<text x="20" y="100"><tspan>foo</tspan>bar<tspan>baz</text>
</svg>
into the code
use XML::LibXML;
my $parser = XML::LibXML->new();
$parser->validation(0);
$parser->no_network(1);
$parser->recover(1);
my $dom = $parser->parse_file('tspan_cdata.svg');
foreach($dom->getElementsByTagName('text')) {
foreach($_->getChildNodes) {
print $_->getName, "\n";
}
}
you get the output
tspan
#text
I am aware of the fact that SVG does not attempt to implement the full
DOM, but I think that the supported subset should follow the DOM
specification.
2) More importantly, the issue is not purely academic. The SVG DTD (see
http://www.w3.org/TR/SVG11/text.html#TextElement for the relevant
section) specifies the content of the text element as:
"( #PCDATA | %SVG.Description.class; | %SVG.Animation.class;
%SVG.TextContent.class; %SVG.Hyperlink.class;
%SVG.text.extra.content; )*"
This means that it is perfectly valid to mix multiple tspan (or tref
etc.) elements with multiple CDATA sections. However, the code
my $svg = SVG->new(width => 200, height => 200);
my $text = $svg->text('x' => 20, 'y' => 100, id => 't01');
$text->tspan->cdata('foo');
$text->cdata('bar');
$text->tspan->cdata('baz');
$text->cdata('qux');
print $svg->xmlify, "\n";
produces
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN"
"http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg height="200" width="200" xmlns="http://www.w3.org/2000/svg"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:xlink="http://www.w3.org/1999/xlink">
<text id="t01" x="20" y="100">
<tspan >foo</tspan>
<tspan >baz</tspan>qux</text>
<!--
Generated using the Perl SVG Module V2.50
by Ronan Oger
Info: http://www.roitsystems.com/
-->
</svg>
which contains only the second CDATA section. Moreover, the CDATA
section always shows up at the very end of the text element, no matter
if it was specified before or after tspan elements. Also, the order is
not recoverable by inspecting $text->getChildNodes and $text->cdata.
However, there are situation where the order does matter. This issue
could be resolved by maintaining CDATA sections as child nodes.
Best wishes,
Lutz