Skip Menu |

This queue is for tickets about the bioperl CPAN distribution.

Report information
The Basics
Id: 64656
Status: new
Priority: 0/
Queue: bioperl

People
Owner: Nobody in particular
Requestors: paolo [...] medeo.net
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.6.1
Fixed in: (no value)



Subject: Bio::Tools::GFF Invalid GFF3 output
I was trying to write GFF3 files starting from Bio::SeqI and Bio::SeqFeatureI type of objects, using Bio::Tools::GFF. The results are far from encouraging. 1) The various elements are not identified by the mandatory ID attribute, nor are linked through a Parent attribute. 2) Bio::Tools::GFF is unable to parse nested Bio::SeqFeature::Generic objects. 3) Complete lack of controlled vocabulary for attributes leads to the insertion of invalid GFF3 attributes. 4) Missing phase information when parsing Bio::Seq objects loaded from GenBank format files. I have attached a simple script that takes a file in GenBank format and translates it into GFF3 format. I use this on-line validator for GFF3: http://modencode.oicr.on.ca/cgi-bin/validate_gff3_online Now: I understand that Bio::Tools::GFF should be replaced by the correspondent functionality in Bio::SeqIO. Is there any estimate? Is there any plan to identify and standardize attribute names when populating objects through Bio::SeqIO, so that these attributes can be properly translated in the equivalent ones when exporting to a specific format? Currently I do not have too much time available for development and I would hate to spend it reinventing wheels. I would therefore appreciate if you could point me to existing resources that could help me in creating valid GFF3 files from BioPerl objects (of course containing all the necessary elements). I cannot guarantee any commitment in contributing with the development of BioPerl. However, I would also appreciate instructions on how I could be helpful in contributing to the codebase. Thanks. Paolo Amedeo
Subject: Test_Tools_GFF.pl
#!/usr/local/bin/perl use strict; use warnings; use Bio::SeqIO; use Bio::Tools::GFF; use File::Basename; my $usage = basename($0) . ' gbk_file gff_output_file'; die "$usage\n\n" unless @ARGV == 2; my $seq_in = Bio::SeqIO->new(-file => $ARGV[0], -format => 'genbank'); my $out = Bio::Tools::GFF->new(-gff_version => 3, -file => ">$ARGV[1]"); while (my $seq = $seq_in->next_seq()) { my @features = $seq->get_SeqFeatures(); $out->write_feature(@features); }