Subject: | A bug in Bio::AlignIO::fasta |
Date: | Wed, 27 Aug 2014 23:42:29 +0000 |
To: | "bug-bioperl [...] rt.cpan.org" <bug-bioperl [...] rt.cpan.org> |
From: | Xiaoyu Zhuo <xzhuo [...] genetics.utah.edu> |
Hi there.
Kinda supprise to find a bug in such a popular module.
Bug description:
For fasta alignment like:
Show quoted text
>seq1/1-100
TTTT…..
Show quoted text>seq2/1-100
CCCC….
Show quoted text>seq3/1-100
AAAA….
Show quoted text>seq4/1-100
GGGG….
All the display_id of Bio::LocatableSeq would be "seq1/1-100" except the last sequence.
the display_id of the last sequence would be just “seq4”.
Why:
This is because the in line 92-93 of Bio/AlignIO/fasta.pm, the regular expression extracting seq name is:
if ( $entry =~ s/^>\s*(\S+)\s*// ) {
$tempname = $1;
However for the last sequence, the seq name is extracted in line 131-132:
if ( $name =~ /(\S+)\/(\d+)-(\d+)$/ ) {
$seqname = $1;
I suppose it can be fixed fairly easily, like change line 131 to:
if ( $name =~ /(\S+\/(\d+)-(\d+))$/ ) {
-Xiaoyu Zhuo