Skip Menu |

This queue is for tickets about the bioperl CPAN distribution.

Report information
The Basics
Id: 98876
Status: resolved
Priority: 0/
Queue: bioperl

People
Owner: Nobody in particular
Requestors: brooknong [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Bio::SeqIO::fastq has a bug
Date: Sun, 14 Sep 2014 23:49:18 +0800
To: bug-bioperl [...] rt.cpan.org
From: Brook Nong <brooknong [...] gmail.com>
I find a bug. when I use module Bio::SeqIO to read some files in fastq format. Most files were successfully processed, but not all. Files which contain any sequences quality line start with an '@' will failed to read. Like this: @Illumina_SRR125365.38 s_5_1_0001_qseq_37 length=76 CCGCCATTTCTTCAAATCTTTTCTTTTCTTTAGGAGTCATCAATTTCCATTTCTCTGCACATTTCTTTGAAAATTA +Illumina_SRR125365.38 s_5_1_0001_qseq_37 length=76 @CCCCCCCCBCCCCCCCCCCCAACCCCCCCCC?CCCCCCCCCCCCCAACCCCCCCCCCCCCCCCCCCCB??<BC># and the failure information show below: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Unknown symbol with ASCII value 62 outside of quality range STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/share/perl/5.14.2/Bio/Root/Root.pm:449 STACK: Bio::SeqIO::fastq::next_dataset /usr/local/share/perl/5.14.2/Bio/SeqIO/fastq.pm:132 STACK: Bio::SeqIO::fastq::next_seq /usr/local/share/perl/5.14.2/Bio/SeqIO/ fastq.pm:51 STACK: pair_fix.pl:50 ----------------------------------------------------------- when i deleted these sequences, it can work perfectly again. Distribution name and version: Bio::SeqIO, 1.006924 Perl version: perl 5, version 14, subversion 2 (v5.14.2) built for x86_64-linux-gnu-thread-multi Operating System vendor and version: 81~precise1-Ubuntu SMP Tue Jul 15 04:02:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Hi Brook, I think you may have the Illumina variant incorrectly set. All data from SRA should be converted to Sanger-based FASTQ, which is the default. If this is set to 'fastq-sanger' it works fine, but if it's set to 'fastq-illumina' if fails. By the way, both 'illumina' and 'solexa' FASTQ variants should be a thing of the past, unless you are digging up very old data. Just in case, I've added this to tests and they will be in the latest bioperl release. Apologies for the wait. chris On Sun Sep 14 10:49:26 2014, brooknong@gmail.com wrote: Show quoted text
> I find a bug. > when I use module Bio::SeqIO to read some files in fastq format. Most > files were successfully processed, but not all. > Files which contain any sequences quality line start with an '@' will > failed to read. Like this: > > @Illumina_SRR125365.38 s_5_1_0001_qseq_37 length=76 > CCGCCATTTCTTCAAATCTTTTCTTTTCTTTAGGAGTCATCAATTTCCATTTCTCTGCACATTTCTTTGAAAATTA > +Illumina_SRR125365.38 s_5_1_0001_qseq_37 length=76 > @CCCCCCCCBCCCCCCCCCCCAACCCCCCCCC?CCCCCCCCCCCCCAACCCCCCCCCCCCCCCCCCCCB??<BC># > > and the failure information show below: > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Unknown symbol with ASCII value 62 outside of quality range > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/local/share/perl/5.14.2/Bio/Root/Root.pm:449 > STACK: Bio::SeqIO::fastq::next_dataset > /usr/local/share/perl/5.14.2/Bio/SeqIO/fastq.pm:132 > STACK: Bio::SeqIO::fastq::next_seq /usr/local/share/perl/5.14.2/Bio/SeqIO/ > fastq.pm:51 > STACK: pair_fix.pl:50 > ----------------------------------------------------------- > > when i deleted these sequences, it can work perfectly again. > Distribution name and version: Bio::SeqIO, 1.006924 > Perl version: perl 5, version 14, subversion 2 (v5.14.2) built for > x86_64-linux-gnu-thread-multi > Operating System vendor and version: 81~precise1-Ubuntu SMP Tue Jul 15 > 04:02:22 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux