Bug #123320 for Text-CSV_XS: Text::CSV_XS bug w/Mac format files

Wed Oct 18 11:30:53 2017 CLemmen [...] excelsiorintegrated.com - Ticket created

Subject:	Text::CSV_XS bug w/Mac format files
Date:	Wed, 18 Oct 2017 13:56:31 +0000
To:	"bug-Text-CSV_XS [...] rt.cpan.org" <bug-Text-CSV_XS [...] rt.cpan.org>
From:	Charles Stuart Lemmen <CLemmen [...] excelsiorintegrated.com>

Hi, I believe I've found a bug in the Text::CSV_XS package with regards to Mac type files (eol as single carriage return) and handling successive files with one csv object ref. Below are the relevant details. Dist name/ver: Text::CSV_XS-1.32 Perl version: This is perl 5, version 24, subversion 0 (v5.24.0) built for MSWin32-x64-multi-thread O/S: Windows 10 Home (Major version: 10 Minor Version: 0.15063) Bug details: If you create one Text::CSV_XS handle and use it with two different files (one with a bad header and one with a good header) that use carriage returns only as end-of-line markers and the first has an invalid header, attempting to get the second (good) file's header will also fail. Skip getting the first file's header and the second one succeeds. If you attempt to get the first (bad) file's header but then clear the '_AHEAD' instance var before getting the second (good) file's header the second will succeed whereas before it did not. There are some things about this first file with the bad header that help to cause the second header call to fail: 1. If it has only one non-header data record and that record does not end with the end-of-line carriage return. 2. If there are multiple non-header records, they all have proper end-of-line carriage returns but the first non-header data record (record #2) has an empty column (,,) - this empty column can be double quoted or not, doesn't matter. So it sounds like "leftover" '_AHEAD' data is somehow negatively influencing the handling of other files. Here's a short code example: # First make two csv files, one with an empty (dangling) header column, one that's ok. # These are both "Mac" format meaning only carriage returns for EOL. my $bad_csv_file = 'test_bad_csv.csv'; my $good_csv_file = 'test_good_csv.csv'; my $bfh; my $gfh; if(open($bfh, '>', $bad_csv_file)) { print($bfh "col1,col2,col3,\r\"One\",\"\",\"Three\"\r\"Four\",\"Five and a half\",\"Six\"\r"); close $bfh; } if(open($gfh, '>', $good_csv_file)) { print($gfh "col1,col2,col3\r\"One\",\"Two\",\"Three\"\r"); close $gfh; } -e $bad_csv_file or croak "No bad file!\n"; -e $good_csv_file or croak "No good file!\n"; # Init csv ref to handle files. my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => "\r"}); # Open and use the new files. open($bfh, '<', $bad_csv_file) or croak "$!\n"; open($gfh, '<', $good_csv_file) or croak "$!\n"; # Get the header of the bad file (this will fail). my @bad_header; eval { local $@; @bad_header = $csv->header($bfh); print "Got bad header ok:\n\n" . Dumper(\@bad_header) . "\n\n"; 1; } or do { print "Failed to get header from bad csv file!\n"; }; # Get the header of the good file (this will fail too but should not). my @good_header; eval { local $@; @good_header = $csv->header($gfh); print "Got good header ok:\n\n" . Dumper(\@good_header) . "\n\n"; 1; } or do { print "Failed to get header from good csv file!\n"; }; close $bfh; close $gfh; My current workaround is going to be, before I call header on the next file, to check if '_AHEAD' exists and is not empty and if so clear it (this should be safe if the name ever changes since we check first). If '_AHEAD' is not present then attempt the header call using eval and if it fails, create a new Text::CSV_XS instance (with the same options as the original) and attempt the header call a second time. If the second call fails then we can be sure the second file is broken too. Thanks! Stuart Lemmen IT Development & Support Excelsior Integrated LLC 413-394-4340 clemmen@excelsiorintegrated.com<mailto:clemmen@excelsiorintegrated.com> www.excelsiorintegrated.com<http://www.excelsiorintegrated.com/> [Excelsior Integrated Small][MCM 3PL seal vector][MCM2017 logo-small2]

Message body is not shown because it is too large.

Download image001.png
image/png 12.3k

Download image004.png
image/png 21.2k

Download image003.jpg
image/jpeg 3.8k

Wed Oct 18 12:09:25 2017 h.m.brand [...] xs4all.nl - Correspondence added

Subject:	Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date:	Wed, 18 Oct 2017 18:09:05 +0200
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	"H.Merijn Brand" <h.m.brand [...] xs4all.nl>

On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text

> I believe I've found a bug in the Text::CSV_XS package with regards > to Mac type files (eol as single carriage return) and handling > successive files with one csv object ref. Below are the relevant > details.

Thanks for the report I think this is very much related to ticket #122764 https://rt.cpan.org/Public/Bug/Display.html?id=122764 which has been resolved in 1.32 but got a fix for an additional problem that surfaced when using a BOM in combination with the \r EOL. The fix is applied in the upcoming 1.33. If you too think it is related, could you try $ wget --output-document=Text-CSV_XS-git.tgz \ https://github.com/Tux/Text-CSV_XS/archive/master.tar.gz and see if that fixes this issue too? If not, I will start digging. That will not be quick, as I do not have a Windows development box right here. (It might be related to Windows) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.

Wed Oct 18 12:09:26 2017 The RT System itself - Status changed from 'new' to 'open'

Wed Oct 18 13:52:17 2017 CLemmen [...] excelsiorintegrated.com - Correspondence added

Subject:	RE: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date:	Wed, 18 Oct 2017 17:52:03 +0000
To:	"bug-Text-CSV_XS [...] rt.cpan.org" <bug-Text-CSV_XS [...] rt.cpan.org>
From:	Charles Stuart Lemmen <CLemmen [...] excelsiorintegrated.com>

Just a quick update that may be relevant and/or shed some light on this issue. I'm experiencing some odd behavior with my test and your ver. 1.33 of this package (compiled and installed locally) on Linux. Keep in mind the perl binary being used is old: This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi So that may play into this too. If I use the following code (which includes my kludgy fix): ### START ### #!/usr/bin/perl use strict; use warnings; use Carp; BEGIN: { use lib '/home/clemmen/bin/pm/lib/perl5/x86_64-linux-thread-multi'; } use Text::CSV_XS; # First make two csv files, one with an empty (dangling) header column, one that's ok. # These are both "Mac" format meaning only carriage returns for EOL. my $bad_csv_file = 'test_bad_csv.csv'; my $good_csv_file = 'test_good_csv.csv'; my $bfh; my $gfh; if(open($bfh, '>', $bad_csv_file)) { print($bfh "col1,col2,col3,\r\"One\",\"\",\"Three\"\r\"Four\",\"Five and a half\",\"Six\"\r"); close $bfh; } if(open($gfh, '>', $good_csv_file)) { print($gfh "col1,col2,col3\r\"One\",\"Two\",\"Three\"\r"); close $gfh; } -e $bad_csv_file or croak "No bad file!\n"; -e $good_csv_file or croak "No good file!\n"; # Init csv ref to handle files. my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => "\r"}); # Open and use the new files. open($bfh, '<', $bad_csv_file) or croak "$!\n"; open($gfh, '<', $good_csv_file) or croak "$!\n"; # Get the header of the bad file (this will fail). my @bad_header; eval { local $@; @bad_header = $csv->header($bfh); print "Got bad header ok:\n\n" . Dumper(\@bad_header) . "\n\n"; 1; } or do { print "Failed to get header from bad csv file!\n"; }; # Get the header of the good file (this will fail too). $csv->{_AHEAD} = ''; my @good_header; eval { local $@; @good_header = $csv->header($gfh); print "Got good header ok:\n\n" . Dumper(\@good_header) . "\n\n"; 1; } or do { print "Failed to get header from good csv file!\n"; }; close $bfh; close $gfh; exit 0; ### END ### this way from the command line I get two errors: $ perl test_csv_bug.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from bad csv file! Failed to get header from good csv file! However, if I simply include a debugging package I often use, the second error goes away!: $ perl -MData::Dumper test_csv_bug_clean.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from bad csv file! Got good header ok: $VAR1 = [ 'col1', 'col2', 'col3' ]; I'm not at this point sure why including that package fixes the issue. -Stu Show quoted text

-----Original Message----- From: h.m.brand@xs4all.nl via RT [mailto:bug-Text-CSV_XS@rt.cpan.org] Sent: Wednesday, October 18, 2017 12:09 PM To: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com> Subject: Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files <URL: https://rt.cpan.org/Ticket/Display.html?id=123320 > On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:

> I believe I've found a bug in the Text::CSV_XS package with regards to > Mac type files (eol as single carriage return) and handling successive > files with one csv object ref. Below are the relevant details.

Thanks for the report I think this is very much related to ticket #122764 https://rt.cpan.org/Public/Bug/Display.html?id=122764 which has been resolved in 1.32 but got a fix for an additional problem that surfaced when using a BOM in combination with the \r EOL. The fix is applied in the upcoming 1.33. If you too think it is related, could you try $ wget --output-document=Text-CSV_XS-git.tgz \ https://github.com/Tux/Text-CSV_XS/archive/master.tar.gz and see if that fixes this issue too? If not, I will start digging. That will not be quick, as I do not have a Windows development box right here. (It might be related to Windows) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Wed Oct 18 14:49:45 2017 CLemmen [...] excelsiorintegrated.com - Correspondence added

Subject:	RE: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date:	Wed, 18 Oct 2017 17:18:28 +0000
To:	"bug-Text-CSV_XS [...] rt.cpan.org" <bug-Text-CSV_XS [...] rt.cpan.org>
From:	Charles Stuart Lemmen <CLemmen [...] excelsiorintegrated.com>

H, Just installed the .pm in the .tar.gz you linked in my Windows environment and same problem persists. I tried to test that 1.33 package on linux but I am experiencing some odd behavior that seems unrelated to Text::CSV_XS. Nonetheless, there does appear to be a difference with how this potential bug works on Windows vs. Linux vs. ? Sorry I don't have more info for you at this time. -Stu Show quoted text

-----Original Message----- From: h.m.brand@xs4all.nl via RT [mailto:bug-Text-CSV_XS@rt.cpan.org] Sent: Wednesday, October 18, 2017 12:09 PM To: Charles Stuart Lemmen <CLemmen@excelsiorintegrated.com> Subject: Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files <URL: https://rt.cpan.org/Ticket/Display.html?id=123320 > On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote:

> I believe I've found a bug in the Text::CSV_XS package with regards to > Mac type files (eol as single carriage return) and handling successive > files with one csv object ref. Below are the relevant details.

Thanks for the report I think this is very much related to ticket #122764 https://rt.cpan.org/Public/Bug/Display.html?id=122764 which has been resolved in 1.32 but got a fix for an additional problem that surfaced when using a BOM in combination with the \r EOL. The fix is applied in the upcoming 1.33. If you too think it is related, could you try $ wget --output-document=Text-CSV_XS-git.tgz \ https://github.com/Tux/Text-CSV_XS/archive/master.tar.gz and see if that fixes this issue too? If not, I will start digging. That will not be quick, as I do not have a Windows development box right here. (It might be related to Windows) -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Thu Oct 19 06:59:14 2017 HMBRAND [...] cpan.org - Broken in 1.22 added

Thu Oct 19 06:59:14 2017 HMBRAND [...] cpan.org - Status changed from 'open' to 'patched'

Thu Oct 19 06:59:46 2017 h.m.brand [...] xs4all.nl - Correspondence added

Subject:	Re: [rt.cpan.org #123320] Text::CSV_XS bug w/Mac format files
Date:	Thu, 19 Oct 2017 12:58:09 +0200
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	"H.Merijn Brand" <h.m.brand [...] xs4all.nl>

On Wed, 18 Oct 2017 11:30:54 -0400, "Charles Stuart Lemmen via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text

> If you create one Text::CSV_XS handle and use it with two different > files (one with a bad header and one with a good header) that use > carriage returns only as end-of-line markers and the first has an > invalid header, attempting to get the second (good) file's header > will also fail. Skip getting the first file's header and the second > one succeeds. If you attempt to get the first (bad) file's header but > then clear the '_AHEAD' instance var before getting the second (good) > file's header the second will succeed whereas before it did not.

So, I rewrote your test to what you see below and immediately see now what is the cause of the failure ... Before the fix: $ perl -Mblib sandbox/rt123320.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from rt123320_bad.csv! # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 2 pos 0 Failed to get header from rt123320_good.csv! After the fix: $ perl -Mblib sandbox/rt123320.pl # CSV_XS ERROR: 1012 - INI - the header contains an empty field @ rec 1 pos 0 Failed to get header from rt123320_bad.csv! Use of uninitialized value in subroutine entry at /data/pro/3gl/CPAN/Text-CSV_XS/blib/lib/Text/CSV_XS.pm line 885, <$gfh> chunk 1. { header_from_good => [ 'col1', 'col2', 'col3' ] } Fetch again to test this. Though I think it is a bad idea to simply re-use $csv after a FAIL in use for one file for another file, it should not fail like this. Thanks again for spotting --8<--- rt123320.pl #!/pro/bin/perl use 5.18.2; use warnings; use Data::Peek; use Text::CSV_XS; # First make two csv files, one with an empty (dangling) header column, one that's ok. # These are both "Mac" format meaning only carriage returns for EOL. my $fn_bad = "rt123320_bad.csv"; my $fn_good = "rt123320_good.csv"; if (open my $bfh, ">", $fn_bad) { print $bfh join "\r" => q{col1,col2,col3,}, q{"One","","Three"}, q{"Four","Five and a half","Six"}, q{}; close $bfh; } if (open my $gfh, ">", $fn_good) { print $gfh join "\r" => q{col1,col2,col3}, q{"One","Two","Three"}, ""; close $gfh; } -e $fn_bad or die "$fn_bad missing!\n"; -e $fn_good or die "$fn_good missing!\n"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1, eol => "\r", }); open my $bfh, "<", $fn_bad or die "$!\n"; open my $gfh, "<", $fn_good or die "$!\n"; # Get the header of the bad file (this will fail). my @bad_header; eval { local $@; @bad_header = $csv->header ($bfh); DDumper { header_from_bad => \@bad_header }; 1; } or warn "Failed to get header from $fn_bad!\n"; # Get the header of the good file (this will fail too but should not). my @good_header; eval { local $@; @good_header = $csv->header ($gfh); DDumper { header_from_good => \@good_header }; 1; } or print "Failed to get header from $fn_good!\n"; close $bfh; close $gfh; -->8--- -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.

Thu Oct 19 16:25:17 2017 HMBRAND [...] cpan.org - Fixed in 1.33 added

Thu Oct 19 16:25:17 2017 HMBRAND [...] cpan.org - Status changed from 'patched' to 'resolved'