Skip Menu |

This queue is for tickets about the XML-Twig CPAN distribution.

Report information
The Basics
Id: 14008
Status: resolved
Priority: 0/
Queue: XML-Twig

People
Owner: Nobody in particular
Requestors: ddascalescu [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 3.19



Date: Wed, 3 Aug 2005 22:56:13 -0700
From: Dan Dascalescu <ddascalescu [...] gmail.com>
To: bug-XML-Twig [...] rt.cpan.org
Subject: Buffer overflow on non-UTF8 CDATA sections > 1024 chars with keep_encoding => 1
XML::Twig seems to duplicate characters beyond the 1024'th in a CDATA section of a non-UTF8 XML, if it was parsed with keep_encoding => 1. Test case attached. Hope that helps, Dan Dascalescu

Message body is not shown because sender requested not to inline it.

Hi, The bug is actually in XML::Parser, as shown by the attached test. I will try investigating it further. __ mirod
[ddascalescu@gmail.com - Thu Aug 4 01:56:34 2005]: Show quoted text
> XML::Twig seems to duplicate characters beyond the 1024'th in a CDATA > section of a non-UTF8 XML, if it was parsed with keep_encoding => 1. > Test case attached.
OK, it looks indeed very much like a bug in XML::Parser. As far as I can tell, the data is returmed in chunks of at most 1024 chars (at least for CDATA). The Char handler is called with each chunk. The bug is that the original_string method returns _all_ of the data the first time it is called, then chunks of 1024 chars. I will file a bug for XML::Parser but I have little hope of it being fixed. The good news now: I have a preliminary fix in the development version of XML::Twig at http://www.xmltwig.com/xmltwig/ It passes your test case, but I have to test it some more: longer CDATA (from my tests with XML::Parser it looks like it should not be a problem), spaces around the 1024 mark, other encodings maybe. Let me know if this works for you. Oh, and thanks for the test case, it really made things easier for me. __ mirod
The current development version has now a better (and simpler!) fix. The tests are at the beginning of t/test_bugs_3.18.t. The line that's commented out ("00- lotsa...") shows you the biggest CDATA section I tested it with. This might be configuration dependent, and a look at the sources for expat or XML::Parser might show the max buffer size. I might check it tomorrow. Let me know if this fixes your problem. -- mirod