Skip Menu |

This queue is for tickets about the Bio-Pipeline-Comparison CPAN distribution.

Report information
The Basics
Id: 113922
Status: open
Priority: 0/
Queue: Bio-Pipeline-Comparison

People
Owner: Nobody in particular
Requestors: francescomusacchia [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Issue reading VCF files with PERL library Vcf.pm after complex filtering
Date: Fri, 22 Apr 2016 13:09:54 +0200
To: bug-Bio-Pipeline-Comparison [...] rt.cpan.org
From: Francesco Musacchia <francescomusacchia [...] gmail.com>
Hi there, I want to report this in issue in reading the VCF output format after GATK-VariantFiltration task. I was trying to use Vcf.pm to read the output VCF file from that program but an error came out when parsing the header. I was just trying the follwoing example code at http://search.cpan.org/~ajpage/Bio-Pipeline-Comparison-1.123050/lib/Vcf.pm with my VCF file: my $vcf = Vcf->new(file=>'example.vcf.gz',region=>'1:1000-2000'); $vcf->parse_header(); # Do some simple parsing. Most thorough but slowest way how to get the data. while (my $x=$vcf->next_data_hash()) { for my $gt (keys %{$$x{gtypes}}) { my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt); print "\t$gt: $al1$sep$al2\n"; } print "\n"; } Specifically, I found this is related with the string related to filters used. The following are the filters as reported in my VCF: *##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">* The double quotes are repeated, as you can see in all the three filters and this is causing Vcf.pm perl library to fail with: Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""]. When I go to remove the repeated double-quotes than I solved this. Infact with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0"> the error is not there. Moreover, one more error comes out because of VariantFiltration. Few lines later the Command line printed has all the parameters given to VariantFiltration and this filter again: *##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. .. .. ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] genotypeFilterExpression=[] genotypeFilterName=[] . .. .... >* And Vcf.pm stops with: Could not parse header line: GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625 ..... I solved in a similar way removing the quotes from the filter elements: *filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]* I do not know if it is a problem in the Vcf.pm library or the VCF format is not respected in VariantFiltration. Hope this will be useful Regards, Francesco Musacchia -- Francesco Musacchia - Ph.D. Computational Biology - Bioinformatics Core Telethon Institute of Genetics and Medicine Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. Tel. +39 081 19230692 Mobile: +39 349 6396351
Subject: Re: [rt.cpan.org #113922] Issue reading VCF files with PERL library Vcf.pm after complex filtering
Date: Fri, 22 Apr 2016 12:15:23 +0100
To: bug-Bio-Pipeline-Comparison [...] rt.cpan.org
From: Andrew Page <andrewjpage [...] gmail.com>
Hi, The latest version of this module is in vcftools, so this might solve your issues: https://vcftools.github.io Regards, Andrew On 22 Apr 2016 12:10, "Francesco Musacchia via RT" < bug-Bio-Pipeline-Comparison@rt.cpan.org> wrote: Fri Apr 22 07:10:09 2016: Request 113922 was acted upon. Transaction: Ticket created by francescomusacchia@gmail.com Queue: Bio-Pipeline-Comparison Subject: Issue reading VCF files with PERL library Vcf.pm after complex filtering Broken in: (no value) Severity: (no value) Owner: Nobody Requestors: francescomusacchia@gmail.com Status: new Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > Hi there, I want to report this in issue in reading the VCF output format after GATK-VariantFiltration task. I was trying to use Vcf.pm to read the output VCF file from that program but an error came out when parsing the header. I was just trying the follwoing example code at http://search.cpan.org/~ajpage/Bio-Pipeline-Comparison-1.123050/lib/Vcf.pm with my VCF file: my $vcf = Vcf->new(file=>'example.vcf.gz',region=>'1:1000-2000'); $vcf->parse_header(); # Do some simple parsing. Most thorough but slowest way how to get the data. while (my $x=$vcf->next_data_hash()) { for my $gt (keys %{$$x{gtypes}}) { my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt); print "\t$gt: $al1$sep$al2\n"; } print "\n"; } Specifically, I found this is related with the string related to filters used. The following are the filters as reported in my VCF: *##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">* The double quotes are repeated, as you can see in all the three filters and this is causing Vcf.pm perl library to fail with: Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""]. When I go to remove the repeated double-quotes than I solved this. Infact with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0"> the error is not there. Moreover, one more error comes out because of VariantFiltration. Few lines later the Command line printed has all the parameters given to VariantFiltration and this filter again: *##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. .. .. ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] genotypeFilterExpression=[] genotypeFilterName=[] . .. .... >* And Vcf.pm stops with: Could not parse header line: GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625 ..... I solved in a similar way removing the quotes from the filter elements: *filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]* I do not know if it is a problem in the Vcf.pm library or the VCF format is not respected in VariantFiltration. Hope this will be useful Regards, Francesco Musacchia -- Francesco Musacchia - Ph.D. Computational Biology - Bioinformatics Core Telethon Institute of Genetics and Medicine Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. Tel. +39 081 19230692 Mobile: +39 349 6396351
Subject: Re: [rt.cpan.org #113922] Issue reading VCF files with PERL library Vcf.pm after complex filtering
Date: Wed, 27 Apr 2016 09:40:17 +0200
To: bug-Bio-Pipeline-Comparison [...] rt.cpan.org
From: Francesco Musacchia <francescomusacchia [...] gmail.com>
Dear Andrew, I just downloaded and used the latest version of vcftools (0.1.14) but this issue is still there: *Use of uninitialized value $attr_value in pattern match (m//) at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm line 2989, <$__ANONIO__> line 3.* *Use of uninitialized value $attr_value in pattern match (m//) at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm line 2989, <$__ANONIO__> line 3.* *Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"">* *Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""].* * at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm line 172* * Vcf::throw('Vcf4_1=HASH(0x1ab4f40)', 'Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Desc...') called at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm line 2981* * Vcf4_0::parse_header_line('Vcf4_1=HASH(0x1ab4f40)', '##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > ...') called at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm line 625* * VcfReader::_next_header_line('Vcf4_1=HASH(0x1ab4f40)') called at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm line 598* * VcfReader::parse_header('Vcf4_1=HASH(0x1ab4f40)') called at parse_vcf.pl <http://parse_vcf.pl> line 10* I asked also at GATK forum and they say their output is VCF compliant. http://gatkforums.broadinstitute.org/gatk/discussion/7518/issue-reading-vcf-files-with-perl-library-vcf-pm-after-complex-filtering Regards, Francesco 2016-04-22 13:15 GMT+02:00 Andrew Page via RT < bug-Bio-Pipeline-Comparison@rt.cpan.org>: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > Hi, > The latest version of this module is in vcftools, so this might solve your > issues: > https://vcftools.github.io > Regards, > Andrew > On 22 Apr 2016 12:10, "Francesco Musacchia via RT" < > bug-Bio-Pipeline-Comparison@rt.cpan.org> wrote: > > Fri Apr 22 07:10:09 2016: Request 113922 was acted upon. > Transaction: Ticket created by francescomusacchia@gmail.com > Queue: Bio-Pipeline-Comparison > Subject: Issue reading VCF files with PERL library Vcf.pm after > complex filtering > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: francescomusacchia@gmail.com > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > > Hi there, > > I want to report this in issue in reading the VCF output format after > GATK-VariantFiltration task. > > I was trying to use Vcf.pm to read the output VCF file from that program > but an error came out when parsing the header. > > I was just trying the follwoing example code at > http://search.cpan.org/~ajpage/Bio-Pipeline-Comparison-1.123050/lib/Vcf.pm > with my VCF file: > > my $vcf = Vcf->new(file=>'example.vcf.gz',region=>'1:1000-2000'); > $vcf->parse_header(); > > # Do some simple parsing. Most thorough but slowest way how to get the > data. > while (my $x=$vcf->next_data_hash()) > { > for my $gt (keys %{$$x{gtypes}}) > { > my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt); > print "\t$gt: $al1$sep$al2\n"; > } > print "\n"; > } > > > Specifically, I found this is related with the string related to filters > used. The following are the filters as reported in my VCF: > > *##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ < > 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> > ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">* > > The double quotes are repeated, as you can see in all the three filters and > this is causing Vcf.pm perl library to fail with: > > Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < > 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < > -8.0""> > Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || > ReadPosRankSum < -8.0""]. > > When I go to remove the repeated double-quotes than I solved this. Infact > with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0"> the > error is not there. > > Moreover, one more error comes out because of VariantFiltration. Few lines > later the Command line printed has all the parameters given to > VariantFiltration and this filter again: > > > *##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat > Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. .. .. > ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < > -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < > 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] > genotypeFilterExpression=[] genotypeFilterName=[] . .. .... >* > > And Vcf.pm stops with: > > Could not parse header line: > > GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625 > ..... > > I solved in a similar way removing the quotes from the filter elements: > > *filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 > || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]* > > I do not know if it is a problem in the Vcf.pm library or the VCF format is > not respected in VariantFiltration. > > Hope this will be useful > > Regards, > > Francesco Musacchia > > -- > Francesco Musacchia - Ph.D. > Computational Biology - Bioinformatics Core > Telethon Institute of Genetics and Medicine > Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. > Tel. +39 081 19230692 > Mobile: +39 349 6396351 > >
-- Francesco Musacchia - Ph.D. Computational Biology - Bioinformatics Core Telethon Institute of Genetics and Medicine Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. Tel. +39 081 19230692 Mobile: +39 349 6396351
Subject: Re: [rt.cpan.org #113922] Issue reading VCF files with PERL library Vcf.pm after complex filtering
Date: Wed, 27 Apr 2016 08:48:03 +0100
To: bug-Bio-Pipeline-Comparison [...] rt.cpan.org
From: Andrew Page <andrewjpage [...] gmail.com>
Hi Francesco, The developers of this module have a mailing list for help: vcftools-help@lists.sourceforge.net and also use Github issues for bug tracking: https://github.com/vcftools/vcftools/issues so they might be able to get to the bottom of your issue. Regards, Andrew On 27 April 2016 at 08:40, Francesco Musacchia via RT < bug-Bio-Pipeline-Comparison@rt.cpan.org> wrote: Show quoted text
> Queue: Bio-Pipeline-Comparison > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > Dear Andrew, > > I just downloaded and used the latest version of vcftools (0.1.14) but this > issue is still there: > > *Use of uninitialized value $attr_value in pattern match (m//) at > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > line 2989, <$__ANONIO__> line 3.* > *Use of uninitialized value $attr_value in pattern match (m//) at > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > line 2989, <$__ANONIO__> line 3.* > *Could not parse header line: FILTER=<ID=HARD_TO_VALIDATE,Description=""QD > < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < > -8.0"">* > *Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || > ReadPosRankSum < -8.0""].* > > * at > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > line 172* > * Vcf::throw('Vcf4_1=HASH(0x1ab4f40)', 'Could not parse header line: > FILTER=<ID=HARD_TO_VALIDATE,Desc...') called at > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > line 2981* > * Vcf4_0::parse_header_line('Vcf4_1=HASH(0x1ab4f40)', > '##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > ...') called > at /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > line 625* > * VcfReader::_next_header_line('Vcf4_1=HASH(0x1ab4f40)') called at > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > line 598* > * VcfReader::parse_header('Vcf4_1=HASH(0x1ab4f40)') called at parse_vcf.pl > <http://parse_vcf.pl> line 10* > > I asked also at GATK forum and they say their output is VCF compliant. > > > http://gatkforums.broadinstitute.org/gatk/discussion/7518/issue-reading-vcf-files-with-perl-library-vcf-pm-after-complex-filtering > > Regards, > > Francesco > > > 2016-04-22 13:15 GMT+02:00 Andrew Page via RT < > bug-Bio-Pipeline-Comparison@rt.cpan.org>: >
> > <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > > > Hi, > > The latest version of this module is in vcftools, so this might solve
> your
> > issues: > > https://vcftools.github.io > > Regards, > > Andrew > > On 22 Apr 2016 12:10, "Francesco Musacchia via RT" < > > bug-Bio-Pipeline-Comparison@rt.cpan.org> wrote: > > > > Fri Apr 22 07:10:09 2016: Request 113922 was acted upon. > > Transaction: Ticket created by francescomusacchia@gmail.com > > Queue: Bio-Pipeline-Comparison > > Subject: Issue reading VCF files with PERL library Vcf.pm after > > complex filtering > > Broken in: (no value) > > Severity: (no value) > > Owner: Nobody > > Requestors: francescomusacchia@gmail.com > > Status: new > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > > > > > Hi there, > > > > I want to report this in issue in reading the VCF output format after > > GATK-VariantFiltration task. > > > > I was trying to use Vcf.pm to read the output VCF file from that program > > but an error came out when parsing the header. > > > > I was just trying the follwoing example code at > >
> http://search.cpan.org/~ajpage/Bio-Pipeline-Comparison-1.123050/lib/Vcf.pm
> > with my VCF file: > > > > my $vcf = Vcf->new(file=>'example.vcf.gz',region=>'1:1000-2000'); > > $vcf->parse_header(); > > > > # Do some simple parsing. Most thorough but slowest way how to get
> the
> > data. > > while (my $x=$vcf->next_data_hash()) > > { > > for my $gt (keys %{$$x{gtypes}}) > > { > > my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt); > > print "\t$gt: $al1$sep$al2\n"; > > } > > print "\n"; > > } > > > > > > Specifically, I found this is related with the string related to filters > > used. The following are the filters as reported in my VCF: > > > > *##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 || MQ
> <
> > 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> > > ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">* > > > > The double quotes are repeated, as you can see in all the three filters
> and
> > this is causing Vcf.pm perl library to fail with: > > > > Could not parse header line:
> FILTER=<ID=HARD_TO_VALIDATE,Description=""QD <
> > 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < > > -8.0""> > > Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || > > ReadPosRankSum < -8.0""]. > > > > When I go to remove the repeated double-quotes than I solved this. Infact > > with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0"> the > > error is not there. > > > > Moreover, one more error comes out because of VariantFiltration. Few
> lines
> > later the Command line printed has all the parameters given to > > VariantFiltration and this filter again: > > > > > >
> *##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat
> > Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. ..
> ..
> > ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum
> <
> > -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < > > 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] > > genotypeFilterExpression=[] genotypeFilterName=[] . .. .... >* > > > > And Vcf.pm stops with: > > > > Could not parse header line: > > > >
> GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625
> > ..... > > > > I solved in a similar way removing the quotes from the filter elements: > > > > *filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum <
> -12.5
> > || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]* > > > > I do not know if it is a problem in the Vcf.pm library or the VCF format
> is
> > not respected in VariantFiltration. > > > > Hope this will be useful > > > > Regards, > > > > Francesco Musacchia > > > > -- > > Francesco Musacchia - Ph.D. > > Computational Biology - Bioinformatics Core > > Telethon Institute of Genetics and Medicine > > Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. > > Tel. +39 081 19230692 > > Mobile: +39 349 6396351 > > > >
> > > -- > Francesco Musacchia - Ph.D. > Computational Biology - Bioinformatics Core > Telethon Institute of Genetics and Medicine > Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. > Tel. +39 081 19230692 > Mobile: +39 349 6396351 > >
Subject: Re: [rt.cpan.org #113922] Issue reading VCF files with PERL library Vcf.pm after complex filtering
Date: Wed, 27 Apr 2016 10:42:21 +0200
To: bug-Bio-Pipeline-Comparison [...] rt.cpan.org
From: Francesco Musacchia <francescomusacchia [...] gmail.com>
ok. thanks. I submitted the question there. Regards, Francesco 2016-04-27 9:48 GMT+02:00 Andrew Page via RT < bug-Bio-Pipeline-Comparison@rt.cpan.org>: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > Hi Francesco, > The developers of this module have a mailing list for help: > vcftools-help@lists.sourceforge.net > and also use Github issues for bug tracking: > https://github.com/vcftools/vcftools/issues > so they might be able to get to the bottom of your issue. > Regards, > Andrew > > > > On 27 April 2016 at 08:40, Francesco Musacchia via RT < > bug-Bio-Pipeline-Comparison@rt.cpan.org> wrote: >
> > Queue: Bio-Pipeline-Comparison > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > > > Dear Andrew, > > > > I just downloaded and used the latest version of vcftools (0.1.14) but
> this
> > issue is still there: > > > > *Use of uninitialized value $attr_value in pattern match (m//) at > > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > > line 2989, <$__ANONIO__> line 3.* > > *Use of uninitialized value $attr_value in pattern match (m//) at > > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > > line 2989, <$__ANONIO__> line 3.* > > *Could not parse header line:
> FILTER=<ID=HARD_TO_VALIDATE,Description=""QD
> > < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < > > -8.0"">* > > *Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || > > ReadPosRankSum < -8.0""].* > > > > * at > > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > > line 172* > > * Vcf::throw('Vcf4_1=HASH(0x1ab4f40)', 'Could not parse header line: > > FILTER=<ID=HARD_TO_VALIDATE,Desc...') called at > > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > > line 2981* > > * Vcf4_0::parse_header_line('Vcf4_1=HASH(0x1ab4f40)', > > '##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > ...')
> called
> > at
> /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm
> > line 625* > > * VcfReader::_next_header_line('Vcf4_1=HASH(0x1ab4f40)') called at > > /cineca/prod/applications/vcftools/0.1.14/gnu--4.8.3/share/perl5/Vcf.pm > > line 598* > > * VcfReader::parse_header('Vcf4_1=HASH(0x1ab4f40)') called at
> parse_vcf.pl
> > <http://parse_vcf.pl> line 10* > > > > I asked also at GATK forum and they say their output is VCF compliant. > > > > > >
> http://gatkforums.broadinstitute.org/gatk/discussion/7518/issue-reading-vcf-files-with-perl-library-vcf-pm-after-complex-filtering
> > > > Regards, > > > > Francesco > > > > > > 2016-04-22 13:15 GMT+02:00 Andrew Page via RT < > > bug-Bio-Pipeline-Comparison@rt.cpan.org>: > >
> > > <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > > > > > Hi, > > > The latest version of this module is in vcftools, so this might solve
> > your
> > > issues: > > > https://vcftools.github.io > > > Regards, > > > Andrew > > > On 22 Apr 2016 12:10, "Francesco Musacchia via RT" < > > > bug-Bio-Pipeline-Comparison@rt.cpan.org> wrote: > > > > > > Fri Apr 22 07:10:09 2016: Request 113922 was acted upon. > > > Transaction: Ticket created by francescomusacchia@gmail.com > > > Queue: Bio-Pipeline-Comparison > > > Subject: Issue reading VCF files with PERL library Vcf.pm after > > > complex filtering > > > Broken in: (no value) > > > Severity: (no value) > > > Owner: Nobody > > > Requestors: francescomusacchia@gmail.com > > > Status: new > > > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=113922 > > > > > > > > > > Hi there, > > > > > > I want to report this in issue in reading the VCF output format after > > > GATK-VariantFiltration task. > > > > > > I was trying to use Vcf.pm to read the output VCF file from that
> program
> > > but an error came out when parsing the header. > > > > > > I was just trying the follwoing example code at > > >
> >
> http://search.cpan.org/~ajpage/Bio-Pipeline-Comparison-1.123050/lib/Vcf.pm
> > > with my VCF file: > > > > > > my $vcf = Vcf->new(file=>'example.vcf.gz',region=>'1:1000-2000'); > > > $vcf->parse_header(); > > > > > > # Do some simple parsing. Most thorough but slowest way how to get
> > the
> > > data. > > > while (my $x=$vcf->next_data_hash()) > > > { > > > for my $gt (keys %{$$x{gtypes}}) > > > { > > > my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt); > > > print "\t$gt: $al1$sep$al2\n"; > > > } > > > print "\n"; > > > } > > > > > > > > > Specifically, I found this is related with the string related to
> filters
> > > used. The following are the filters as reported in my VCF: > > > > > > *##FILTER=<ID=HARD_TO_VALIDATE,Description=""QD < 2.0 || FS > 60.0 ||
> MQ
> > <
> > > 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0""> > > > ##FILTER=<ID=LowQual,Description=""QUAL > 30.0 && QUAL < 100.0"">* > > > > > > The double quotes are repeated, as you can see in all the three filters
> > and
> > > this is causing Vcf.pm perl library to fail with: > > > > > > Could not parse header line:
> > FILTER=<ID=HARD_TO_VALIDATE,Description=""QD <
> > > 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < > > > -8.0""> > > > Stopped at [QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || > > > ReadPosRankSum < -8.0""]. > > > > > > When I go to remove the repeated double-quotes than I solved this.
> Infact
> > > with ##FILTER=<ID=LowQual,Description="QUAL > 30.0 && QUAL < 100.0">
> the
> > > error is not there. > > > > > > Moreover, one more error comes out because of VariantFiltration. Few
> > lines
> > > later the Command line printed has all the parameters given to > > > VariantFiltration and this filter again: > > > > > > > > >
> >
> *##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625,Date="Sat
> > > Apr 16 12:01:58 CEST 2016",Epoch=1460800918456,CommandLineOptions=.. ..
> > ..
> > > ..... filterExpression=["QD < 2.0 || FS > 60.0 || MQ < 40.0 ||
> MQRankSum
> > <
> > > -12.5 || ReadPosRankSum < -8.0", "QUAL < 30.0", "QUAL > 30.0 && QUAL < > > > 100.0"] filterName=[HARD_TO_VALIDATE, VeryLowQual, LowQual] > > > genotypeFilterExpression=[] genotypeFilterName=[] . .. .... >* > > > > > > And Vcf.pm stops with: > > > > > > Could not parse header line: > > > > > >
> >
> GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.4-46-gbc02625
> > > ..... > > > > > > I solved in a similar way removing the quotes from the filter elements: > > > > > > *filterExpression=[QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum <
> > -12.5
> > > || ReadPosRankSum < -8.0, QUAL < 30.0, QUAL > 30.0 && QUAL < 100.0]* > > > > > > I do not know if it is a problem in the Vcf.pm library or the VCF
> format
> > is
> > > not respected in VariantFiltration. > > > > > > Hope this will be useful > > > > > > Regards, > > > > > > Francesco Musacchia > > > > > > -- > > > Francesco Musacchia - Ph.D. > > > Computational Biology - Bioinformatics Core > > > Telethon Institute of Genetics and Medicine > > > Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. > > > Tel. +39 081 19230692 > > > Mobile: +39 349 6396351 > > > > > >
> > > > > > -- > > Francesco Musacchia - Ph.D. > > Computational Biology - Bioinformatics Core > > Telethon Institute of Genetics and Medicine > > Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. > > Tel. +39 081 19230692 > > Mobile: +39 349 6396351 > > > >
> >
-- Francesco Musacchia - Ph.D. Computational Biology - Bioinformatics Core Telethon Institute of Genetics and Medicine Via Campi Flegrei 34, 80078 Pozzuoli (NA), Italy. Tel. +39 081 19230692 Mobile: +39 349 6396351