Subject: | Bug report for Text::CSV_XS |
Date: | Thu, 15 Jun 2017 19:58:29 +0000 |
To: | "bug-Text-CSV_XS [...] rt.cpan.org" <bug-Text-CSV_XS [...] rt.cpan.org> |
From: | Charles Stuart Lemmen <CLemmen [...] excelsiorintegrated.com> |
Hi, we've had some trouble handling files with the Text::CSV_XS package and I believe I've narrowed down the issue.
First some info:
Package: Text::CSV_XS
Version: 1.30
Perl: Strawberry Perl ver. 24, subversion 0 (v5.24.0) built for MSWin32-x64-multi-thread
O/S: Windows 10 Home
We typically init our Text::CSV_XS object reference thusly:
my $csv = Text::CSV_XS->new({binary => 1, auto_diag => 1});
We also don't specify $csv->eol but let the package sort that out.
When using this package in production, we typically process 1+n .csv files from a folder and get each row by calling the 'getline' method then checking the row results.
What we've discovered is that though typically the $csv object ref won't set its internal representation of $csv->eol from the default of '' (empty string), when a file is encountered that has line-endings of a single carriage return (AKA "\r", \x0d), the call to 'getline' sets $csv->eol to a single carriage return. This works fine for this one file, and others like it, but if the next file to be processed has line endings of "\r\n" $csv->eol is NOT reset to '' and they will then be read in completely by a single call to 'getline' which naturally breaks things. It is clear from debugging that some magic is going in internally with such EOL=="\r" files since calling tell($csv_fh) for each "record" after the first call to 'getline' returns the size of the file meaning splitting of the lines is being done behind the scenes (tell() gives expected different results for each line of a file with "\r\n" line endings).
The workaround for us is to go against the documentation's recommendation of, "...so it is probably safer to not specify eol at all.", do careful EOL discovery, and supply that to a call to $csv->eol() for each file and in certain cases simply do away with $csv->getline() and use "while(<$csv_fh>)" along with calls to $csv->parse() instead.
I'm inclined to feel that if this package sometimes sets $csv->eol then it should also either unset it when necessary or set it every time to avoid this problem. Hope this makes sense and is helpful to you.
Stuart Lemmen
IT Development & Support
Excelsior Integrated LLC
413-394-4340
clemmen@excelsiorintegrated.com<mailto:clemmen@excelsiorintegrated.com>
www.excelsiorintegrated.com<http://www.excelsiorintegrated.com/>
[Excelsior Integrated Small][MCM 3PL seal vector][MCM2017 logo-small2]