Skip Menu |

This queue is for tickets about the bioperl CPAN distribution.

Report information
The Basics
Id: 59665
Status: rejected
Priority: 0/
Queue: bioperl

People
Owner: Nobody in particular
Requestors: y-bushmanova [...] northwestern.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Bio::Biblio::IO copyright parsing bug
Date: Fri, 23 Jul 2010 14:34:25 -0500
To: bug-bioperl [...] rt.cpan.org
From: Yulia Bushmanova <y-bushmanova [...] northwestern.edu>
Parsing Pubmed xml (attached) produces following error (Bio::Biblio::IO v.1.006001, XML::Parser v. 2.36) not well-formed (invalid token) at line 32, column 48, byte 3095 at /usr/lib/perl5/XML/Parser.pm line 187 Looks like parser complains on the copyright sign, if to remove it from file parsing goes fine. * Perl version perl, v5.8.7 built for i486-linux-gnu-thread-multi * Operating System vendor and version Linux cgm_oracle 2.6.15-28-386 #1 PREEMPT Wed Jul 18 22:50:32 UTC 2007 i686 GNU/Linux Thanks, Yulia

Message body is not shown because it is too large.

On Fri Jul 23 15:34:39 2010, y-bushmanova@northwestern.edu wrote: Show quoted text
> Parsing Pubmed xml (attached) produces following error (Bio::Biblio::IO > v.1.006001, XML::Parser v. 2.36) > > not well-formed (invalid token) at line 32, column 48, byte 3095 at > /usr/lib/perl5/XML/Parser.pm line 187
The XML file encoding is not set to UTF-8 (required for the copyright symbol). Once this is set the file is parsed correctly.