Subject: | Bio::Phylo::Parsers::Newick regexp bug |
Date: | Thu, 16 Mar 2006 11:58:10 -0500 |
To: | bug-bio-phylo [...] rt.cpan.org |
From: | Allen Smith <easmith [...] beatrice.rutgers.edu> |
Thanks for Bio::Phylo!
$Id: Newick.pm,v 1.22 2005/09/29 20:31:18 rvosa Exp $
If I try feeding a tree into Bio::Phylo::IO for parsing with newick format,
and said tree has a (bracketed) comment in it (which is normally allowed
anyplace a newline is allowed -
http://evolution.genetics.washington.edu/phylip/newick_doc.html), such as
the log likelihood from tree-puzzle-5.2, an error (generally) happens, such
as:
Invalid [] range "=-4" in regex; marked by <-- HERE in m/^.*[,|\)|\(][lh=-4
<-- HERE 464.484953]([,|:|\)|;].*)$/
(The tree in question, which is also not read correctly for the material in
'' - I understand parsing quotes and escapes is a headache, having tried to
do it myself! - is:
[ lh=-4464.484953 ](Methanococcus_voltae:0.32692,(('Pyrococcus furiosus (includes Pyrococcus woesei)':0.05887,Pyrococcus_abyssi:0.03869)100:0.36861,
(((Sulfolobus_solfataricus:0.08344,Sulfolobus_tokodaii:0.10668)100:0.15268,Aeropyrum_pernix:0.20003)
100:0.09351,Desulfuroccus_amylolyticus:0.18345)100:0.28706)100:0.41157,'Methanococcus jannaschii (aka Methanocaldococcus jannaschii)':0.00001);
If one lists all the names of the nodes retrieved from the above, one gets:
Methanococcus_voltae
'Pyrococcusfuriosus
includesPyrococcuswoesei
'
Pyrococcus_abyssi
100
Sulfolobus_solfataricus
Sulfolobus_tokodaii
100
Aeropyrum_pernix
100
Desulfuroccus_amylolyticus
100
100
'Methanococcusjannaschii
akaMethanocaldococcusjannaschii
'
n1
)
The problem appears to be that Newick.pm has a function, _parse_string, with
a bug in it:
my ( $st, $depth, $name ) = ( $string, 0, $node->get_name );
$st =~ s/^.*[,|\)|\(]$name([,|:|\)|;].*)$/$1/;
$name in the above should be quotemeta'd, and comments (in []) should, if
possible, be eliminated earlier - probably replaced with newlines. (I am
also curious as to the reason for the | symbols in the character classes
([]); I can't see what they're doing, unless they're simply to make it
easier to read...)
Thanks again for Bio::Phylo,
-Allen
--
Allen Smith http://cesario.rutgers.edu/easmith/
September 11, 2001 A Day That Shall Live In Infamy II
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin