Skip Menu |

This queue is for tickets about the XML-SAX CPAN distribution.

Report information
The Basics
Id: 52434
Status: new
Priority: 0/
Queue: XML-SAX

People
Owner: Nobody in particular
Requestors: triddle [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: New feature standardization: non-blocking IO
Hello, I am the author of Parse::MediaWikiDump which frequently has to deal with the English Wikipedia dump files that currently sit at 22 gigabytes. I'm in an never ending quest to get processing speeds of that module up as fast as possible. Because Parse::MediaWikiDump expresses an API that uses a pull method I need non-blocking IO from an XML parser which limits me currently to XML::Parser. XML::Parser works well however XML::SAX::ExpatXS is considerably faster. I would like to switch over to ExpatXS to gain the speed but I need to retain the non-blocking IO feature of XML::Parser. I've looked into XML::SAX::ExpatXS and there are existing and seemingly unused hooks for supporting parsing a document a part at a time. I'm confident I can enable this feature fairly easily however after searching for non-blocking perl SAX I see a lot of people who are unhappy about the fact that non-blocking IO is not part of the standard. I propose that it become one. Here is my proposal: * Add a new feature called http://xml.org/sax/features/non-blocking * Use the following methods for the non-blocking API: - parse_start() - setup the parser instance and get it ready to accept data; no return value - parse_more($data) - parse a piece of the document and invoke any callbacks required; returns true if everything is ok or false if not - parse_done() - signal that there is no more of the document; returns true if everything is ok or false otherwise This follows the API expressed by XML::SAX::Expat::Incremental which is unfortunately built on top of XML::Parser so it won't give me the speed increase I need. I think that standardizing non-blocking IO is a worthwhile endeavor.