Subject: | seq_ids returns only lower-case entry names |
Date: | Thu, 24 Sep 2009 15:41:48 -0400 |
To: | bug-Bio-SamTools [...] rt.cpan.org |
From: | Nancy Hansen <nhansen [...] mail.nih.gov> |
Hi Lincoln (if this e-mail does indeed get to you!)
In your Bio::DB::Sam module, a call to the seq_ids method returns
only lower-cased versions of the reference sequences, which is not
mentioned in the documentation. I'd rather see it return the exact
strings found in the BAM file and/or fasta reference, as I'd like to do
something like:
my @targets = $sam_obj->seq_ids();
foreach my $this_target (@targets)
{
my $target_length = $sam_obj->length($this_target);
$sam_obj->fasta_pileup("$this_target:1-$target_length", $my_routine, 0);
}
etc. So I missed chrX inadvertently when I assumed the targets were
returned as they were in the BAM file. I can fix this in my own code,
but I'd love to see it changed in the next release of the modules.
Also, while I have your attention, the code seems to be sucking up
memory as I do a pileup through the chromosome, so for large
chromosomes, I kludge it by streaming through them in blocks. Just
wanted to be sure you were aware...
The modules have been a huge help to me. Thanks for releasing them!
--Nancy
--
*************************************
Nancy F. Hansen, PhD nhansen@nhgri.nih.gov
Comparative Genomics Unit, NHGRI
5625 Fishers Lane
Rockville, MD 20852
Phone: (301) 435-1560 Fax: (301) 435-6170