Skip Menu |

This queue is for tickets about the XML-SAX CPAN distribution.

Report information
The Basics
Id: 52563
Status: new
Priority: 0/
Queue: XML-SAX

People
Owner: Nobody in particular
Requestors: mca+cpanrt [...] sanger.ac.uk
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: PurePerl parser rejects DTD with unhelpful error message
Date: Tue, 8 Dec 2009 15:43:42 +0000
To: bug-XML-SAX [...] rt.cpan.org
From: mca+cpanrt [...] sanger.ac.uk
I had this problem with XML-SAX-0.96 . "perl-5.8.8 -v" reports This is perl, v5.8.8 built for x86_64-linux-thread-multi [...] "uname -a" reports Linux seq1a 2.6.22.19-lustre-1.6.7.1 #2 SMP Fri Apr 17 17:52:49 BST 2009 x86_64 GNU/Linux It's a Debian 4.0 machine but the Perl is built and installed by our systems group & extra modules provided by the pathogen analysis group. I believe I can demonstrate the problem cleanly... but the issue is muddied at our end by having two XML::SAX installations with independent ParserDetails.ini files, one of which didn't show the problem because it defaulted to XML::LibXML::SAX. The problem is provoked in the PurePerl parser by DTD such as <!ELEMENT superscaffold (scaffold, (superbridge+,scaffold)*) > The error message is choice/seq contains no opening bracket [Ln: 12, Col: 125326784] which we found unhelpful outside the context of parsing the DTD of an XML document. Also the column number doesn't make any sense to me, I didn't investigate that any further. A workaround is to re-write it as <!ELEMENT superscaffold (scaffold, ((superbridge)+,scaffold)*) > Quick summary of versions, all using XML::SAX::ParserFactory v1.01 XML::SAX::PurePerl v0.96 fails; v0.90 and v0.92 work OK XML::LibXML::SAX v1.69, the W3C validator and some Java XML parser all agree that the original document is valid I'm sorry I haven't found reference to the relevant piece of DTD (E)BNF, or worked out a patch. I have included a short example that provokes the problem, inline below. To fix my own code, I merely insist on using XML::LibXML. I hope the bug report is useful, -- Matthew #! /software/bin/perl # # (That's the non-OS Perl instance support by our sysads; # /usr/bin/perl has no XML parser, DBI etc. installed. Local software # is thus decoupled from OS upgrades.) # This is a minimal SAX handler class, it never sees action package NulHandl; use base 'XML::SAX::Base'; package main; use strict; use warnings; use YAML 'Dump'; use XML::SAX::ParserFactory; sub main { # Dictate the parser $XML::SAX::ParserPackage = "XML::SAX::PurePerl"; # $XML::SAX::ParserPackage = "XML::LibXML::SAX"; # Set up my $xml = join "", <DATA>; my $xh = NulHandl->new; my $sax = XML::SAX::ParserFactory->parser(Handler => $xh); # It chokes on "+" in ChoiceOrSeq. We can fix it either of these # ways, # $xml =~ s{\b(\w+)\+}{($1)+}g; # bracket the element # $xml =~ s{\b(\w+)\+}{$1}g; # remove the plus # Show some info my %info = (xml_length => length($xml), '%INC' => \%INC, versions => { "XML::SAX" => XML::SAX->VERSION, "XML::SAX::PurePerl" => XML::SAX::PurePerl->VERSION, "XML::LibXML" => XML::LibXML->VERSION, perl => $] }, '$sax' => $sax); print Dump(\%info); # Make it go BANG my $eod = $sax->parse_string($xml); print "\n** finished without error **\n"; } main(); __DATA__ <?xml version='1.0' encoding='utf-8'?> <!DOCTYPE assembly [ <!ELEMENT assembly (superscaffold*) > <!ATTLIST assembly instance CDATA #REQUIRED organism CDATA #REQUIRED date CDATA #REQUIRED Show quoted text
>
<!ELEMENT superscaffold (scaffold, (superbridge+,scaffold)*) > <!ATTLIST superscaffold id CDATA #REQUIRED size CDATA #REQUIRED Show quoted text
>
<!ELEMENT scaffold (contig, (gap,contig)*) > <!ATTLIST scaffold id CDATA #REQUIRED sense (F|R) #REQUIRED Show quoted text
>
<!ELEMENT contig EMPTY> <!ATTLIST contig id CDATA #REQUIRED name CDATA #IMPLIED size CDATA #REQUIRED project CDATA #REQUIRED sense (F|R) #REQUIRED Show quoted text
>
<!ELEMENT gap (bridge+)> <!ATTLIST gap size CDATA #REQUIRED Show quoted text
>
<!ELEMENT bridge (link+)> <!ATTLIST bridge template CDATA #REQUIRED name CDATA #IMPLIED silow CDATA #REQUIRED sihigh CDATA #REQUIRED gapsize CDATA #REQUIRED Show quoted text
>
<!ELEMENT superbridge (link+)> <!ATTLIST superbridge template CDATA #REQUIRED name CDATA #IMPLIED silow CDATA #REQUIRED sihigh CDATA #REQUIRED Show quoted text
>
<!ELEMENT link EMPTY> <!ATTLIST link contig CDATA #REQUIRED read CDATA #REQUIRED cstart CDATA #REQUIRED cfinish CDATA #REQUIRED sense (F|R) #REQUIRED Show quoted text
>
]> <!-- this data is truncated and redacted because it is not the cause of the problem --> <assembly instance="pathogen" organism="FOO" date="2009-09-23 14:38:27" > <superscaffold id="1" size="1538161" > <scaffold id="1" sense="F" > <contig id="1480" size="1223160" project="1" sense="F" /> </scaffold> </superscaffold> </assembly> -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.