Subject: | >100% identity scores produced by Simrank module |
Date: | Tue, 11 Aug 2015 12:54:36 -0400 |
To: | "bug-String-Simrank [...] rt.cpan.org" <bug-String-Simrank [...] rt.cpan.org> |
From: | "Faller, Lina" <lfaller [...] forsyth.org> |
Hello all,
I tried running the Simrank perl module but am encountering strange results when running it with a multi-sequence input file.
Specifically, I get >100% percent identity scores and I get wrong results when I pass in a multi-sequence file. I do seem to get the right results when I pass in a query file with a single sequence.
I also tried passing in a file with three identical sequences, which produced the right answer but added the percent identity scores.
See for example the input and output file (produced by simrank_nuc.pl) for the last experiment:
$ cat $IN1
Show quoted text
>UC20.PAL_2x
GATGAACGCTGGCTACAGGCTTAACACATGCAAGTCGAGGGGAAACGACGGGGAAGCTTGCTTCCCCGGGCGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCT
Show quoted text>UC20.PAL_2y
GATGAACGCTGGCTACAGGCTTAACACATGCAAGTCGAGGGGAAACGACGGGGAAGCTTGCTTCCCCGGGCGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCT
Show quoted text>UC20.PAL_2z
GATGAACGCTGGCTACAGGCTTAACACATGCAAGTCGAGGGGAAACGACGGGGAAGCTTGCTTCCCCGGGCGTCGACCGGCGCACGGGTGAGTAACGCGTATCCAACCTGCCTCTGACTGAGGGATAACCCGTCGAAAGTCGGCCT
$> less -S $OUT
UC20.PAL_2x 311_6474:100.00 311_F045:91.18 289AA016:72.06 289AA020:59.56 289_4315:58.82 293AO009:56.62 307_8826:54.41 291_3524:48.53 291AH005:48.5
UC20.PAL_2y 311_6474:200.00 311_F045:182.35 289AA016:144.12 289AA020:119.12 289_4315:117.65 293AO009:113.24 307_8826:108.82 291_3524:97.06 291AH005:97.0
UC20.PAL_2z 311_6474:300.00 311_F045:273.53 289AA016:216.18 289AA020:178.68 289_4315:176.47 293AO009:169.85 307_8826:163.24 291_3524:145.59 291AH005:145.
I am using Perl v5.22.0 built for x86_64-linux. The module version is 0.079.
Thanks for any advice you might have!
Lina Faller