Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 61576
Status: open
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: dgatwood [...] mac.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 3.35
Fixed in: (no value)



Subject: Serious problem with concurrency
It looks like multiple twigs with different options do not correctly coexist. When you run the test code below, as soon as a new Twig is created with different options, &amp; turns into &amp;amp; on the content parsed by the *other* twig. #!/usr/bin/perl use XML::Twig; my $gi_twig = XML::Twig->new(keep_encoding => 1, keep_spaces => 1, ); $gi_twig->parse("<?xml version='1.0' encoding='UTF-8'?>\n<!DOCTYPE data PUBLIC '- Schema' '/path'>\n<data><p>&amp;</p></data>\n"); my $oldval = $gi_twig->root->sprint(); my $decodetwig = XML::Twig->new(keep_spaces => 1); my $newval = $gi_twig->root->sprint(); if ($oldval ne $newval) { print STDERR "OUCH! The node changed on its own\n"; print STDERR "WAS: $oldval\n"; print STDERR "NOW: $newval\n"; } Humorously, the whole reason my code was creating a second twig in the first place was that the entity behavior of text()/set_text() is nonparallel. When keep_encoding is set, the text() function returns the text with entities still encoded, but if you call set_text(), it expects decoded data and proceeds to encode it, so calling $node->set_text($node->text()); causes... yup, you guessed it. &amp;amp;. AAAAARGH! Ideally, text() and set_text() should both work with raw, unencoded text regardless of keep_encoding, and there should be an encoded_text() and set_encoded_text() that both work with encoded text regardless of keep_encoding. This would greatly simplify use of the API. Unfortunately, changing the behavior of text() now would probably break somebody's code. So as an alternative, perhaps you could have: text(): behaves as it does now, marked deprecated in the docs; use get_text instead. get_text(): returns the decoded text. get_encoded_text(): returns the encoded text. set_text(): sets the decoded text. set_encoded_text(): sets the encoded text. prefix() adds a prefix using decoded text as it does now encoded_prefix() adds a prefix using encoded text suffix() adds a suffix using decoded text as it does now encoded_suffix() adds a suffix using encoded text For now, I'm going to hack this in my decodedText function (which is, in itself, a workaround) by creating a bogus Twig at the end with keep_encoding set, but this is a really NASTY bug, and even if you can't fix the bug quickly, at a minimum, the documentation for text() and set_text() should be updated to mention that set_text() expects decoded text and that text() currently returns decoded text if keep_encoding is not set, else encoded text.... Ugh. Environment; Mac OS X 10.6, XML::Twig version 2.35, XML::Parser version 2.36, libxml version 20703.
From: dgatwood [...] mac.com
Perl v5.10.0, BTW.
Subject: Workaround doesn't work, either.
From: dgatwood [...] mac.com
Okay, now I really don't get it. Now that I've added that workaround, set_text expects encoded text.... *slams head repeatedly into a wall*
Subject: Re: [rt.cpan.org #61576] Serious problem with concurrency
Date: Thu, 23 Sep 2010 13:07:35 +0200
To: bug-XML-Twig [...] rt.cpan.org
From: mirod <xmltwig [...] gmail.com>
The problem comes from the fact that a certain number of options are not managed on a document level, but are global. keep_encoding, pretty_print and the likes are just set by the last XML::Twig new option, or by individual method calls. This is mentioned in the docs, look for 'Globals' The reason for this can certainly be called "premature optimization": by not managing those settings on a per-twig level, the code can get away with not storing, or looking for, the twig to which elements belong before printing (or sprint-ing) it. It would be possible to fix this, but I am not sure it's worth it. A workaround would be to manage it yourself, using the *_global_state methods. In the case of your test code, simply adding a call to set_keep_encoding will make the code work, as shown below. Does it help? !/usr/bin/perl use XML::Twig; my $gi_twig = XML::Twig->new(keep_encoding => 1, keep_spaces => 1, ); $gi_twig->parse("<?xml version='1.0' encoding='UTF-8'?>\n<!DOCTYPE data PUBLIC '- Schema' '/path'>\n<data><p>&amp;</p></data>\n"); my $oldval = $gi_twig->root->sprint(); my $decodetwig = XML::Twig->new(keep_spaces => 1); $gi_twig->set_keep_encoding( 1); # could also be called on $decodetwig my $newval = $gi_twig->root->sprint(); if ($oldval ne $newval) { print STDERR "OUCH! The node changed on its own\n"; print STDERR "WAS: $oldval\n"; print STDERR "NOW: $newval\n"; }