Subject: | Text::CSV_XS bug w/Mac format files |
Date: | Wed, 18 Oct 2017 13:56:31 +0000 |
To: | "bug-Text-CSV_XS [...] rt.cpan.org" <bug-Text-CSV_XS [...] rt.cpan.org> |
From: | Charles Stuart Lemmen <CLemmen [...] excelsiorintegrated.com> |
Hi,
I believe I've found a bug in the Text::CSV_XS package with regards to Mac type files (eol as single carriage return) and handling successive files with one csv object ref. Below are the relevant details.
Dist name/ver: Text::CSV_XS-1.32
Perl version: This is perl 5, version 24, subversion 0 (v5.24.0) built for MSWin32-x64-multi-thread
O/S: Windows 10 Home (Major version: 10 Minor Version: 0.15063)
Bug details:
If you create one Text::CSV_XS handle and use it with two different files (one with a bad header and one with a good header) that use carriage returns only as end-of-line markers and the first has an invalid header, attempting to get the second (good) file's header will also fail. Skip getting the first file's header and the second one succeeds. If you attempt to get the first (bad) file's header but then clear the '_AHEAD' instance var before getting the second (good) file's header the second will succeed whereas before it did not.
There are some things about this first file with the bad header that help to cause the second header call to fail:
1. If it has only one non-header data record and that record does not end with the end-of-line carriage return.
2. If there are multiple non-header records, they all have proper end-of-line carriage returns but the first non-header data record (record #2) has an empty column (,,) - this empty column can be double quoted or not, doesn't matter.
So it sounds like "leftover" '_AHEAD' data is somehow negatively influencing the handling of other files. Here's a short code example:
# First make two csv files, one with an empty (dangling) header column, one that's ok.
# These are both "Mac" format meaning only carriage returns for EOL.
my $bad_csv_file = 'test_bad_csv.csv';
my $good_csv_file = 'test_good_csv.csv';
my $bfh;
my $gfh;
if(open($bfh, '>', $bad_csv_file)) {
print($bfh "col1,col2,col3,\r\"One\",\"\",\"Three\"\r\"Four\",\"Five and a half\",\"Six\"\r");
close $bfh;
}
if(open($gfh, '>', $good_csv_file)) {
print($gfh "col1,col2,col3\r\"One\",\"Two\",\"Three\"\r");
close $gfh;
}
-e $bad_csv_file or croak "No bad file!\n";
-e $good_csv_file or croak "No good file!\n";
# Init csv ref to handle files.
my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1, eol => "\r"});
# Open and use the new files.
open($bfh, '<', $bad_csv_file) or croak "$!\n";
open($gfh, '<', $good_csv_file) or croak "$!\n";
# Get the header of the bad file (this will fail).
my @bad_header;
eval {
local $@;
@bad_header = $csv->header($bfh);
print "Got bad header ok:\n\n" . Dumper(\@bad_header) . "\n\n";
1;
}
or do {
print "Failed to get header from bad csv file!\n";
};
# Get the header of the good file (this will fail too but should not).
my @good_header;
eval {
local $@;
@good_header = $csv->header($gfh);
print "Got good header ok:\n\n" . Dumper(\@good_header) . "\n\n";
1;
}
or do {
print "Failed to get header from good csv file!\n";
};
close $bfh;
close $gfh;
My current workaround is going to be, before I call header on the next file, to check if '_AHEAD' exists and is not empty and if so clear it (this should be safe if the name ever changes since we check first). If '_AHEAD' is not present then attempt the header call using eval and if it fails, create a new Text::CSV_XS instance (with the same options as the original) and attempt the header call a second time. If the second call fails then we can be sure the second file is broken too.
Thanks!
Stuart Lemmen
IT Development & Support
Excelsior Integrated LLC
413-394-4340
clemmen@excelsiorintegrated.com<mailto:clemmen@excelsiorintegrated.com>
www.excelsiorintegrated.com<http://www.excelsiorintegrated.com/>
[Excelsior Integrated Small][MCM 3PL seal vector][MCM2017 logo-small2]
Message body is not shown because it is too large.