Subject: | behaviour of qpos in deletion regions |
the current behaviour of qpos when entering a deletion in a read is to
keep its last value.
Using the SNP caller example code from the POD of Bio::DB::Sam, take
this example of a read with one deletion at base 10 (Perl debugger output):
x $b->padded_alignment
0
'GTTTTTGAAGACGACCTTCGATTTTTTTAATTTTATTTTCATTAACAGCAAAATCTCTGAAATCGACTATATATACTTTTAATTTATTCAATAAATGCAGATATGAATTGCATTCAAAAATATTTTCATAAAGGGATCTATATTTATT'
1 '|||||||||
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||'
2
'GTTTTTGAA-ACGACCTTCGATTTTTTTAATTTTATTTTCATTAACAGCAAAATCTCTGAAATCGACTATATATACTTTTAATTTATTCAATAAATGCAGATATGAATTGCATTCAAAAATATTTTCATAAAGGGATCTATATTTATT'
traversing the pileups from the position before the deletion (continuing
with the example SNP caller code), $pileup->qpos returns the following
values on each call of the callback sub for the above read:
8, 9, 9, 10, 11 ....
So it stays at 9 where the read has the deletion.
A problem arises when one uses the code in your example to get to the
base of the read
substr( $b->qseq, $pileup->qpos, 1 )
which would return an A inside the deletion where the read has no base.
I know that I could use is_del to check for this case and skip it but I
think it would be safer for qpos to return undef where the read has no
base, which would prevent incorrect bases to be returned and would
correctly reflect the fact that the query has no position that matches
the current reference position.
Would also be great if the example callback code could show how to
properly work with indels as this is not entirely intuitive I think.
Thanks for looking into this!