Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 48514
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: pinnsvin [...] mail.ru
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 0.16
Fixed in: 1.31



Subject: BOM-signed CSVs: strange behavior
Date: Fri, 07 Aug 2009 16:04:39 +0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Anton Soldatov <pinnsvin [...] mail.ru>
Hello, I actually don't know whether it's a bug or not, but anyway... uname -a: Linux sparta 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41 EDT 2008 i686 athlon i386 GNU/Linux perl -v: This is perl, v5.10.0 built for i386-linux-thread-multi module: Text::CSV_XS 0.65 (but also actual for Text::CSV_PP as far as I can test) Problem: when I try to open a Unicode CSV file signed with the BOM in binary mode, $csv->getline returns nothing. Currecnt fix: currently I'm using open_bom from File::BOM instead of the built-in open. Test script to reproduce the bug: #!/usr/bin/perl -w use strict; use warnings; use Text::CSV_XS; use File::BOM qw(open_bom); my $enco = 'UTF-16LE'; my $file = 'bom.csv'; my $use_bom = 0; # set to 1 to use open_bom my $csv = Text::CSV_XS->new({ binary => 1, sep_char => "\t", eol => "\r\n" }); my $io; if ($use_bom) { open_bom $io, $file, ":encoding($enco)"; } else { open $io, "<:encoding($enco)", $file; } while (my $row = $csv->getline($io)) { print "$row\n"; } close $io; exit 0; Output on my machine: $use_bom = 0 - nothing $use_bom = 1 - two rows read, just as planned:) ARRAY(0x976a114) ARRAY(0x9774df4) Test csv: please see the attachment (CSV in UTF-16LE) Best regards, Anton Soldatov
Download bom.csv
application/octet-stream 480b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #48514] BOM-signed CSVs: strange behavior
Date: Fri, 7 Aug 2009 14:17:19 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Fri, 07 Aug 2009 08:05:49 -0400, "Anton Soldatov via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> I actually don't know whether it's a bug or not, but anyway...
I don't think it is a Text::CSV/Text::CSV_XS problem, but it would be nice if it would work Show quoted text
> Problem: when I try to open a Unicode CSV file signed with the BOM > in binary mode, $csv->getline returns nothing. > Current fix: currently I'm using open_bom from File::BOM instead of the > built-in open.
Were you the one talking about this to me in Lisbon? If not, someone else does have exactly the same problem. As Text::CSV_XS's getline () uses perl's getline () under the hood, I think that CORE's getline () is what should/could be fixed instead. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, OpenSuSE 10.3, 11.0, and 11.1, AIX 5.2 and 5.3. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re[2]: [rt.cpan.org #48514] BOM-signed CSVs: strange behavior
Date: Mon, 10 Aug 2009 10:24:55 +0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Anton Soldatov <pinnsvin [...] mail.ru>
Show quoted text
> Were you the one talking about this to me in Lisbon? If not, someone > else does have exactly the same problem.
No, it wasn't me. Unfortunately;) Show quoted text
> As Text::CSV_XS's getline () uses perl's getline () under the hood, I > think that CORE's getline () is what should/could be fixed instead.
Thanks for this info! Show quoted text
-----Original Message----- From: "h.m.brand@xs4all.nl via RT" <bug-Text-CSV_XS@rt.cpan.org> To: pinnsvin@mail.ru Date: Fri, 07 Aug 2009 08:17:39 -0400 Subject: Re: [rt.cpan.org #48514] BOM-signed CSVs: strange behavior Best regards, Anton Soldatov
Rejected as bug in Text::CSV_XS, as this module cannot deal with this problem. If someone wishes to pursue this, please post to the perl5 posters developers list with some proposals.
Fixed using the header method, So I changed rejected to resolved Before: --8<--- my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => "\t", eol => "\r\n", auto_diag => 2, }); binmode STDOUT, ":encoding(utf-8)"; open my $fh, "<:encoding(utf-16le)", "rt48514.tsv"; while (my $row = $csv->getline ($fh)) { say "@$row"; } close $fh; --8<--- will cause failure like --8<--- # CSV_XS ERROR: 2034 - EIF - Loose unescaped quote @ rec 0 pos 4 field 1 -->8--- With --8<--- my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => "\t", eol => "\r\n", auto_diag => 2, }); binmode STDOUT, ":encoding(utf-8)"; open my $fh, "<:encoding(utf-16le)", "rt48514.tsv"; my @hdr = $csv->header ($fh); say "@hdr"; while (my $row = $csv->getline ($fh)) { say "@$row"; } close $fh; -->8--- You will get --8<--- status source target description Legacy browser браузер A software program used to locate and display Web pages. Some browsers also allow users to send and receive e-mail, read newsgroups, and play sound or video files. -->8---