Subject: | Bio::Tools::GFF Invalid GFF3 output |
I was trying to write GFF3 files starting from Bio::SeqI and
Bio::SeqFeatureI type of objects, using Bio::Tools::GFF.
The results are far from encouraging.
1) The various elements are not identified by the mandatory ID
attribute, nor are linked through a Parent attribute.
2) Bio::Tools::GFF is unable to parse nested Bio::SeqFeature::Generic
objects.
3) Complete lack of controlled vocabulary for attributes leads to the
insertion of invalid GFF3 attributes.
4) Missing phase information when parsing Bio::Seq objects loaded from
GenBank format files.
I have attached a simple script that takes a file in GenBank format and
translates it into GFF3 format.
I use this on-line validator for GFF3:
http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online
Now: I understand that Bio::Tools::GFF should be replaced by the
correspondent functionality in Bio::SeqIO. Is there any estimate?
Is there any plan to identify and standardize attribute names when
populating objects through Bio::SeqIO, so that these attributes can be
properly translated in the equivalent ones when exporting to a specific
format?
Currently I do not have too much time available for development and I
would hate to spend it reinventing wheels. I would therefore appreciate
if you could point me to existing resources that could help me in
creating valid GFF3 files from BioPerl objects (of course containing all
the necessary elements).
I cannot guarantee any commitment in contributing with the development
of BioPerl. However, I would also appreciate instructions on how I could
be helpful in contributing to the codebase.
Thanks.
Paolo Amedeo
Subject: | Test_Tools_GFF.pl |
#!/usr/local/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::GFF;
use File::Basename;
my $usage = basename($0) . ' gbk_file gff_output_file';
die "$usage\n\n" unless @ARGV == 2;
my $seq_in = Bio::SeqIO->new(-file => $ARGV[0], -format => 'genbank');
my $out = Bio::Tools::GFF->new(-gff_version => 3, -file => ">$ARGV[1]");
while (my $seq = $seq_in->next_seq()) {
my @features = $seq->get_SeqFeatures();
$out->write_feature(@features);
}