Subject: | Serious problem with concurrency |
It looks like multiple twigs with different options do not correctly coexist. When you run
the test code below, as soon as a new Twig is created with different options, & turns
into & on the content parsed by the *other* twig.
#!/usr/bin/perl
use XML::Twig;
my $gi_twig = XML::Twig->new(keep_encoding => 1,
keep_spaces => 1,
);
$gi_twig->parse("<?xml version='1.0' encoding='UTF-8'?>\n<!DOCTYPE data PUBLIC '-
Schema' '/path'>\n<data><p>&</p></data>\n");
my $oldval = $gi_twig->root->sprint();
my $decodetwig = XML::Twig->new(keep_spaces => 1);
my $newval = $gi_twig->root->sprint();
if ($oldval ne $newval) {
print STDERR "OUCH! The node changed on its own\n";
print STDERR "WAS: $oldval\n";
print STDERR "NOW: $newval\n";
}
Humorously, the whole reason my code was creating a second twig in the first place was
that the entity behavior of text()/set_text() is nonparallel. When keep_encoding is set, the
text() function returns the text with entities still encoded, but if you call set_text(), it
expects decoded data and proceeds to encode it, so calling
$node->set_text($node->text());
causes... yup, you guessed it. &amp;. AAAAARGH!
Ideally, text() and set_text() should both work with raw, unencoded text regardless of
keep_encoding, and there should be an encoded_text() and set_encoded_text() that both
work with encoded text regardless of keep_encoding. This would greatly simplify use of
the API.
Unfortunately, changing the behavior of text() now would probably break somebody's code.
So as an alternative, perhaps you could have:
text():
behaves as it does now, marked deprecated in the docs; use get_text instead.
get_text():
returns the decoded text.
get_encoded_text():
returns the encoded text.
set_text():
sets the decoded text.
set_encoded_text():
sets the encoded text.
prefix()
adds a prefix using decoded text as it does now
encoded_prefix()
adds a prefix using encoded text
suffix()
adds a suffix using decoded text as it does now
encoded_suffix()
adds a suffix using encoded text
For now, I'm going to hack this in my decodedText function (which is, in itself, a
workaround) by creating a bogus Twig at the end with keep_encoding set, but this is a
really NASTY bug, and even if you can't fix the bug quickly, at a minimum, the
documentation for text() and set_text() should be updated to mention that set_text()
expects decoded text and that text() currently returns decoded text if keep_encoding is not
set, else encoded text.... Ugh.
Environment; Mac OS X 10.6, XML::Twig version 2.35, XML::Parser version 2.36, libxml
version 20703.