Seth Viebrock via RT wrote:
Show quoted text> Queue: XML-Twig
> Ticket <URL:
http://rt.cpan.org/Ticket/Display.html?id=35672 >
>
> ...which ultimately ends up calling the following code in XML::Expat,
> and hangs. A cold call to XML::Parser->parse does not yield this error,
> so it seems related to the arguments that Twig is ultimately passing to
> Expat.
>
> eval {
> $result = $expat->parse($arg);
> };
Hi,
The problem is that the simple call to expat doesn't includes any
handlers. As XML::Twig builds the tree for the XML, OTOH, it kinda needs
to set handlers on the various events.
In this case the character handler is called for each line of the data,
actually twice for each line, once for the data and once for the line
return. So it ends up being called over 120 000 times for your example.
That's always going to be longer than not calling the handler at all!
The good news is that I made a mistake in that handler. I did not
provide an explicit return: the returned value is not used in any way,
so why bother? Why? Because as it was written it returned the partial
content of the element. So it ended up passing 120 000 * 4Mb/2 (average
size of the text content of the element) so 500G of data to be
allocated, copied, and de-allocated (one hopes!). I added an explicit
empty return and voilĂ ! Processing time went from 581s down to 2s.
The new version is at the usual place:
http://xmltwig.com/xmltwig/
Thanks a lot for the bug report, this improvement should benefit most
users (including me!)
--
mirod