Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 42261
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: patrick.bourdon [...] bigfoot.com
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: (no value)
Fixed in: (no value)



Subject: Working with data with embedded newlines
Date: Fri, 9 Jan 2009 23:42:30 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Patrick BOURDON <patrick.bourdon [...] bigfoot.com>
Hello, I am trying Text::CSV_XS working with data with embedded newlines. French sep-char is ';'. I try to give you hereafter my problem context using a on line csv file. Actually, the bug.csv in the exemple has just one "element" including an embedded new line which makes finally a 2 lines file: Line 1: a;b;c;d;"e Line 2: f" I got a "EIQ - Quoted field not terminated" error. So I changed to Text::CSV_PP, and it works (at least as I was expected). So I suspect there is a bug in CSV_XS::getline. But I may also not be using the module as you may require. Just tell me. Thanks for the module. My installation: - Perl: Active Perl 5.10.0 Build 1004 - OS: Windows XP Patrick Bourdon - Paris - France ------------------------------------------------------------ use IO::File(); { use Text::CSV_XS(); my $file = 'bug.csv'; my $fh = IO::File->new("<$file") or die("open(<$file) impossible: ($!)!"); my $csv = Text::CSV_XS->new ({ sep_char => ";", always_quote => 1, binary => 1, allow_whitespace => 1, verbatim => 1, }); my $fields = $csv->getline ($fh); # Dump: # $csv->error_diag () = [ '2027' 'EIQ - Quoted field not terminated' '12']; # $csv->error_input() = 'a;b;c;d;"e #'; # $csv->fields() = undef } { use Text::CSV_PP(); my $file = 'bug.csv'; my $fh = IO::File->new("<$file") or die("open(<$file) impossible: ($!)!"); my $csv = Text::CSV_PP->new ({ sep_char => ";", always_quote => 1, binary => 1, allow_whitespace => 1, verbatim => 1, }); my $fields = $csv->getline ($fh); # Dump: # $csv->fields() = [ 'a', 'b', 'c', 'd', 'e # f' ]; }
Subject: Re: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sat, 10 Jan 2009 10:36:47 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
Show quoted text
> My installation: > - Perl: Active Perl 5.10.0 Build 1004 > - OS: Windows XP
You didn't mention the version of Text::CSV_XS Your problem is the use of the 'verbatim' attribute. You should almost never use that. --8<--- verbatim This is a quite controversial attribute to set, but it makes hard things possible. The basic thought behind this is to tell the parser that the normally special characters newline (NL) and Carriage Return (CR) will not be special when this flag is set, and be dealt with as being ordinary binary characters. This will ease working with data with embedded newlines. When "verbatim" is used with "getline ()", "getline ()" auto-chomp's every line. : : For parse () this means that the parser has no idea about line end- ing anymore, and getline () chomps line endings on reading. -->8--- Additional notes, always_quote is a writing option basically, and should not be used when parsing. The counterpart for parsing is blank_is_undef. my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ";" }); is more than enough for your described format. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re[2]: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sat, 10 Jan 2009 16:03:02 +0100
To: "h.m.brand [...] xs4all.nl via RT" <bug-Text-CSV_XS [...] rt.cpan.org>
From: Patrick BOURDON <patrick.bourdon [...] bigfoot.com>
<URL: https://rt.cpan.org/Ticket/Display.html?id=42261 > Hello, Thank you for your prompt answer. $VERSION = "0.58"; (the last available on CPAN - Sorry forgetting to mention it) I now strictly use your suggested creation line: my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ";" }); But I still get the problem. Attached a simple standalone failing test: - (bug.csv) csv input file: 2 lines - (bug.pl) code: 40 lines - (bug.out) dump output text file: 40 lines It shows it still works with CSV_PP and not with CSV_XS. I even have no diagnostics with CSV_XS now. I want to use CSV_XS for quite obvious performance reason. Patrick. ************************************************ My installation: - Perl: Active Perl 5.10.0 Build 1004 - OS: Windows XP You didn't mention the version of Text::CSV_XS Your problem is the use of the 'verbatim' attribute. You should almost never use that. --8<--- verbatim This is a quite controversial attribute to set, but it makes hard things possible. The basic thought behind this is to tell the parser that the normally special characters newline (NL) and Carriage Return (CR) will not be special when this flag is set, and be dealt with as being ordinary binary characters. This will ease working with data with embedded newlines. When "verbatim" is used with "getline ()", "getline ()" auto-chomp's every line. : : For parse () this means that the parser has no idea about line end- ing anymore, and getline () chomps line endings on reading. -->>8--- Additional notes, always_quote is a writing option basically, and should not be used when parsing. The counterpart for parsing is blank_is_undef. my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ";" }); is more than enough for your described format.

Message body is not shown because sender requested not to inline it.

Download bug.log
application/octet-stream 680b

Message body not shown because it is not plain text.

Download bug.out
application/octet-stream 1.2k

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sat, 10 Jan 2009 16:37:12 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sat, 10 Jan 2009 10:03:12 -0500, "patrick.bourdon@bigfoot.com via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> Thank you for your prompt answer. > $VERSION = "0.58"; (the last available on CPAN - Sorry forgetting to mention it) > > I now strictly use your suggested creation line: > my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ";" }); > > But I still get the problem. > Attached a simple standalone failing test: > - (bug.csv) csv input file: 2 lines
That file was not attached :( Can I get that file, preferably inside a tgz or zip, so I'm sure that the line-endings are not mangled? I cannot reproduce your situation here, and I fear it might be either a case of `weird' line endings in the csv file, or a mismatch in ActivePerl vs Strawberry perl. I can test the latter, not the first. Show quoted text
> - (bug.pl) code: 40 lines > - (bug.out) dump output text file: 40 lines > > It shows it still works with CSV_PP and not with CSV_XS. > I even have no diagnostics with CSV_XS now. > I want to use CSV_XS for quite obvious performance reason.
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re[2]: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sat, 10 Jan 2009 16:51:53 +0100
To: "h.m.brand [...] xs4all.nl via RT" <bug-Text-CSV_XS [...] rt.cpan.org>
From: Patrick BOURDON <patrick.bourdon [...] bigfoot.com>
Re[2]: [rt.cpan.org #42261] Working with data with embedded newlines

As requested.


>> - (bug.csv) csv input file: 2 lines

That file was not attached  

Can I get that file, preferably inside a tgz or zip, so I'm sure that

the line-endings are not mangled?







Download bug.zip
application/x-zip-compressed 127b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sun, 11 Jan 2009 12:41:15 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sat, 10 Jan 2009 10:52:03 -0500, "patrick.bourdon@bigfoot.com via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Weird, on my system, that works perfectly fine /Text-CSV_XS/tmp 107 > ../examples/csv-check -s\; bug.csv Checked with ../examples/csv-check 1.2 using Text::CSV_XS 0.58 OK: rows: 1, columns: 5 sep = <;>, quo = <">, bin = <1> I'll try to find an ASperl system and see what happens there. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sun, 11 Jan 2009 16:40:03 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
Linux ========================================================================== Text-CSV_XS/tmp > cat bug.pl #!/pro/bin/perl use strict; use warnings; use Text::CSV_XS; use Test::More tests => 1; my $csv = Text::CSV_XS->new ({ binary => 1, sep_char => ";" }); open my $bug, "<", "bug.csv"; my $row = $csv->getline ($bug) or $csv->error_diag (); $^O eq "MSWin32" or eval q{ use Data::Peek; print STDERR "# ", DPeek ($_), "\n" for @$row; }; is_deeply ($row, [ "a", "b", "c", "d", "e\nf" ], "content"); Text-CSV_XS/tmp > dump bug.csv [DUMP 0.6.01] 00000000 61 3B 62 3B 63 3B 64 3B 22 65 0A 66 22 0D 0A a;b;c;d;"e.f".. Text-CSV_XS/tmp > perl ../examples/csv-check -s";" bug.csv Checked with ../examples/csv-check 1.2 using Text::CSV_XS 0.58 OK: rows: 1, columns: 5 sep = <;>, quo = <">, bin = <1> Text-CSV_XS/tmp > perl bug.pl 1..1 # PV("a"\0) # PV("b"\0) # PV("c"\0) # PV("d"\0) # PV("e\nf"\0) ok 1 - content Text-CSV_XS/tmp > perl -v This is perl, v5.10.0 built for i686-linux-64int Copyright 1987-2007, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. Text-CSV_XS/tmp > MSWin2k + ASperl 5.8.8 ========================================================================== L:\Text-CSV_XS\tmp> perl -v This is perl, v5.8.8 built for MSWin32-x86-multi-thread (with 18 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Binary build 822 [280952] provided by ActiveState http://www.ActiveState.com Built Jul 31 2007 19:34:48 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. L:\Text-CSV_XS\tmp> perl ..\examples\csv-check -s";" bug.csv Checked with ..\examples\csv-check 1.2 using Text::CSV_XS 0.58 OK: rows: 1, columns: 5 sep = <;>, quo = <">, bin = <1> L:\Text-CSV_XS\tmp> perl bug.pl 1..1 ok 1 - content L:\Text-CSV_XS\tmp> MSWinXP + ASperl 5.10.0 ========================================================================== L:\Text-CSV_XS\tmp> perl -v This is perl, v5.10.0 built for MSWin32-x86-multi-thread (with 5 registered patches, see perl -V for more detail) Copyright 1987-2007, Larry Wall Binary build 1004 [287188] provided by ActiveState http://www.ActiveState.com Built Sep 3 2008 13:16:37 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5 source kit. Complete documentation for Perl, including FAQ lists, should be found on this system using "man perl" or "perldoc perl". If you have access to the Internet, point your browser at http://www.perl.org/, the Perl Home Page. L:\Text-CSV_XS\tmp> perl ..\examples\csv-check -s";" bug.csv Checked with ..\examples\csv-check 1.2 using Text::CSV_XS 0.52 OK: rows: 1, columns: 5 sep = <;>, quo = <">, bin = <1> L:\Text-CSV_XS\tmp> perl bug.pl 1..1 ok 1 - content L:\Text-CSV_XS\tmp> -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #42261] Working with data with embedded newlines
Date: Sun, 11 Jan 2009 22:47:25 +0100
To: "h.m.brand [...] xs4all.nl via RT" <bug-Text-CSV_XS [...] rt.cpan.org>
From: Patrick BOURDON <patrick.bourdon [...] bigfoot.com>
<URL: http://rt.cpan.org/Ticket/Display.html?id=42261 > I have the exactly same behaviour using the following installation: - Active Perl 5.10.0 Build 1004 (latest)) - OS: Vista - Text:CSV_XS 0.52 (as delivered with Active Perl 5.10.0 and not upgraded to 0.58) - Text:CSV 1.10 (added cause not part of standard ActivePerl) Tell me if and how I can help more. The point has no urgency. Patrick.
Summarizing the final conclusion for readers of the bug ... ---8<--- Text::CSV_XS is leading. Text::CSV_PP should theoretically behave exactly the same, but what you see is the implementation details pepping out. As the ->getline () method is written to interact with IO on the lowest level possible for speed reasons, it doesn't need to fill the data-structures that are used/needed by the alternate method of using the combination of ->parse () and ->fields (). ->getline () does these two in one single blow, returning a reference to the fields. It does NOT fill the data structures that are needed for ->fields () to work, as these are only filled by the ->parse () method. This is done purely for speed. As ->getline () effectively returns all the data you need, there is absolutely no need for the ->fields () method to do anything at all here. What you are seeing is that the implementation of XS is not filling what is needed to have ->fields () return what you erroneously (but understandably) expect, where the PP implementation reveals that is uses the same structure internally. That it *does* give you the fields is undocumented behavior. -->8---