Skip Menu |

This queue is for tickets about the bioperl CPAN distribution.

Report information
The Basics
Id: 98374
Status: resolved
Priority: 0/
Queue: bioperl

People
Owner: Nobody in particular
Requestors: xzhuo [...] genetics.utah.edu
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: A bug in Bio::AlignIO::fasta
Date: Wed, 27 Aug 2014 23:42:29 +0000
To: "bug-bioperl [...] rt.cpan.org" <bug-bioperl [...] rt.cpan.org>
From: Xiaoyu Zhuo <xzhuo [...] genetics.utah.edu>
Hi there. Kinda supprise to find a bug in such a popular module. Bug description: For fasta alignment like: Show quoted text
>seq1/1-100
TTTT….. Show quoted text
>seq2/1-100
CCCC…. Show quoted text
>seq3/1-100
AAAA…. Show quoted text
>seq4/1-100
GGGG…. All the display_id of Bio::LocatableSeq would be "seq1/1-100" except the last sequence. the display_id of the last sequence would be just “seq4”. Why: This is because the in line 92-93 of Bio/AlignIO/fasta.pm, the regular expression extracting seq name is: if ( $entry =~ s/^>\s*(\S+)\s*// ) { $tempname = $1; However for the last sequence, the seq name is extracted in line 131-132: if ( $name =~ /(\S+)\/(\d+)-(\d+)$/ ) { $seqname = $1; I suppose it can be fixed fairly easily, like change line 131 to: if ( $name =~ /(\S+\/(\d+)-(\d+))$/ ) { -Xiaoyu Zhuo
Yep, verified. I also noticed that the test suite actually has a bad test for this that, if actually written correctly would have caught it. I have added your suggested change and modified the test for next release. Thanks! On Wed Aug 27 18:42:50 2014, xzhuo@genetics.utah.edu wrote: Show quoted text
> Hi there. > > Kinda supprise to find a bug in such a popular module. > > Bug description: > For fasta alignment like:
> > seq1/1-100
> TTTT…..
> > seq2/1-100
> CCCC….
> > seq3/1-100
> AAAA….
> > seq4/1-100
> GGGG…. > > All the display_id of Bio::LocatableSeq would be "seq1/1-100" except > the last sequence. > the display_id of the last sequence would be just “seq4”. > > > Why: > > This is because the in line 92-93 of Bio/AlignIO/fasta.pm, the regular > expression extracting seq name is: > > if ( $entry =~ s/^>\s*(\S+)\s*// ) { > $tempname = $1; > > However for the last sequence, the seq name is extracted in line 131- > 132: > > if ( $name =~ /(\S+)\/(\d+)-(\d+)$/ ) { > $seqname = $1; > > > I suppose it can be fixed fairly easily, like change line 131 to: > if ( $name =~ /(\S+\/(\d+)-(\d+))$/ ) { > > > -Xiaoyu Zhuo