[ddascalescu@gmail.com - Thu Aug 4 01:56:34 2005]:
Show quoted text> XML::Twig seems to duplicate characters beyond the 1024'th in a CDATA
> section of a non-UTF8 XML, if it was parsed with keep_encoding => 1.
> Test case attached.
OK, it looks indeed very much like a bug in XML::Parser.
As far as I can tell, the data is returmed in chunks of at most 1024
chars (at least for CDATA). The Char handler is called with each chunk.
The bug is that the original_string method returns _all_ of the data the
first time it is called, then chunks of 1024 chars. I will file a bug
for XML::Parser but I have little hope of it being fixed.
The good news now: I have a preliminary fix in the development version
of XML::Twig at
http://www.xmltwig.com/xmltwig/ It passes your test
case, but I have to test it some more: longer CDATA (from my tests with
XML::Parser it looks like it should not be a problem), spaces around the
1024 mark, other encodings maybe.
Let me know if this works for you.
Oh, and thanks for the test case, it really made things easier for me.
__
mirod