Subject: | Coverage and pileup functions corrupted by samtools core |
Hi Lincoln,
In the core of the pileup function a default max depth of 8000 reads is
set. Can you expose the function that can override this value
(bam_pileup.c bam_plp_set_maxcnt) so that accurate coverage stats can be
generated please.
The case that I found to expose this issue most dramatically is when a
file contains split reads:
1. You have a position spanned by 8000 split reads that start at ~1:2000.
2. This position has 500 unsplit reads starting at 1:2500.
3. You do a mpileup on the position 1:2600.
4. None of the unsplit reads are seen by the function as the reads with
the lowest start coord will be parsed first even though they do not
cover the position.
I've attached the following:
1. A BAM that exhibits this issue containing 8001 records, this also
shows that the cutoff is actually 8000-1.
2. A dump of commands showing how the result changes from a bam with the
last 7999 records up to the one I've attached.
3. The script used to produce the output in (2).
Regards,
Keiran
Subject: | read_depth_at_pos.pl |
#!/usr/bin/env perl
use strict;
use Data::Dumper;
use Carp;
use English qw( -no_match_vars );
use warnings FATAL => 'all';
use Bio::DB::Sam;
print 'Bio::DB::Sam version: ',Bio::DB::Sam->VERSION,"\n";
my $d_count = 0;
my $bam = shift @ARGV;
my $chr = shift @ARGV;
my $pos = shift @ARGV;
my $sam = Bio::DB::Sam->new(-bam => $bam,
-split_splices => 1);
my @alignments = $sam->get_features_by_location(-seq_id => $chr,
-start => $pos,
-end => $pos);
print "Raw feature_by_location: ";
print (scalar @alignments);
print "\n";
my $segment = $sam->segment(-seq_id=>$chr,-start=>$pos,-end=>$pos);
my ($coverage) = $segment->features('coverage');
print "coverage: ",$coverage->coverage,"\n";
my $pos_str = $chr.':'.$pos.'-'.$pos;
$sam->pileup($pos_str,\&depth_callback);
print "pileup: $d_count\n";
$d_count = 0;
$sam->fast_pileup($pos_str,\&fast_depth_callback);
print "fast_pileup: $d_count\n";
print $d_count,"\n";
sub depth_callback {
my ($seqid,$pos_l,$pileup) = @_;
if($pos_l == $pos) {
foreach my $p(@{$pileup}) {
next if($p->is_refskip);
$d_count++;
}
}
}
sub fast_depth_callback {
my ($seqid,$pos_l,$pileup, $t_sam) = @_;
if($pos_l == $pos) {
foreach my $p(@{$pileup}) {
next if($p->is_refskip);
$d_count++;
}
}
}
Subject: | l8001.bam |
Message body not shown because it is not plain text.
Subject: | dump_of_result.txt |
cgp-1-1-01[kr2]5: ~kr2/noddy/read_depth_at_pos.pl l7999.bam 2 85133126
Bio::DB::Sam version: 1.29
Raw feature_by_location: 7999
coverage: 282
pileup: 282
fast_pileup: 282
282
cgp-1-1-01[kr2]6: ~kr2/noddy/read_depth_at_pos.pl l8000.bam 2 85133126
Bio::DB::Sam version: 1.29
Raw feature_by_location: 8000
coverage: 281
pileup: 281
fast_pileup: 281
281
cgp-1-1-01[kr2]7: ~kr2/noddy/read_depth_at_pos.pl l8001.bam 2 85133126
Bio::DB::Sam version: 1.29
Raw feature_by_location: 8001
coverage: 280
pileup: 280
fast_pileup: 280
280