Subject: | Bio::Graphics::Glyph::segments GFF3 CIGAR support |
Bio::Graphics::Glyph::segments doesn't support a GFF3-style CIGAR string that can appear in
the "Gap" attribute of a GFF3 file.
The following should be sufficient to handle both types of
CIGAR strings (though _split_on_cigar() would have to be
extended to handle the "F" and "R" frameshift operations
supported by GFF3-style CIGAR).
--- ./Bio/Graphics/Glyph/segments.pm.orig 2010-08-13 21:36:18.484660468 +0000
+++ ./Bio/Graphics/Glyph/segments.pm 2010-08-16 16:40:32.229815777 +0000
@@ -897,10 +897,18 @@
return unless $cigar;
my @arry;
- while ($cigar =~ /(\d*)([A-Z])/g) {
- my ($count,$op) = ($1,$2);
- $count ||= 1;
- push @arry,[$op,$count];
+ if ($cigar =~ /\d\s*$/) { # ends in a digit; assume GFF3 format
+ while ($cigar =~ /([A-Z])(\d*)/g) {
+ my ($op,$count) = ($1,$2);
+ $count ||= 1;
+ push @arry,[$op,$count];
+ }
+ } else { # assume SAM format
+ while ($cigar =~ /(\d*)([A-Z])/g) {
+ my ($count,$op) = ($1,$2);
+ $count ||= 1;
+ push @arry,[$op,$count];
+ }
}
return \@arry;
}
This isn't quite perfect, however: with a "+"-strand Target and
"-" strand source, the matched segments seem to appear in
reverse order; e.g., for a feature like:
Gm13 Glycine_max EST_match 31059022 31065493 75
- . ID=Cf46d.path1;Name=Cf46d;Target=Cf46d 104 834 +;Gap=M188 I1 M2
I3 M21 N958 M4 I1 M89 N1714 M117 N812 M87 N242 M99 N1852 M55 N168 M64
There is a comment in the _split_on_cigar() subroutine in segments.pm
that mentions that not all source/target strand combinations are
supported, so that may be the issue.