Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 44402
Status: resolved
Worked: 20 min
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: eric.roode.cpan [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 0.60
Fixed in: 0.63



Subject: Unexpected results parsing tab-separated spaces
Perhaps this issue is out-of-scope for Text::CSV_XS, because I am using it to parse tab-separated data, and I do not know whether that is supported. When parsing tab-separated data, and the input record contains fields that consist entirely of spaces (ascii 32), and allow_whitespace is enabled, then Text::CSV_XS appears to remove not only the spaces but also the adjacent tab characters, which has the effect of removing actual fields from the data record. Further, if the record ends with one or more spaces, then the parse fails completely with the error "EIF - Binary character in unquoted field, binary off". I have attached a sample program that provides eight simple test cases, four with allow_whitespace on, four with it off. I think it is self-explanatory. Thank you, Eric J. Roode
Subject: csv_test.pl
#!perl # Test case that demonstrates surprising Text::CSV_XS behavior. use Text::CSV_XS; my $csv = Text::CSV_XS->new({ sep_char => "\t", allow_whitespace => 0, }); my $rec1 = qq{One\t\tThree\t\t\tSix}; my $rec2 = qq{One\t \tThree\t \t \tSix}; my $rec3 = qq{One\t \tThree\t \t \t }; my $rec4 = qq{ \t \tThree\t \t \tSix}; print "We expect 6 fields each time.\n"; # ---------------- Adjacent tabs, no allow_whitespace ---------------- if ($csv->parse($rec1)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec1\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse rec1: $diag\n"; } # ---------------- Space-infested tabs, no allow_whitespace ---------------- if ($csv->parse($rec2)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec2\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse rec2: $diag\n"; } # ---------------- Spaces at the end, no allow whitespace ---------------- if ($csv->parse($rec3)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec3\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse rec3: $diag\n"; } # ---------------- Spaces at the front, no allow whitespace ---------------- if ($csv->parse($rec4)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec4\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse recf: $diag\n"; } # Now allow whitespace $csv = Text::CSV_XS->new({ sep_char => "\t", allow_whitespace => 1, }); # ---------------- Adjacent tabs, allow_whitespace ---------------- if ($csv->parse($rec1)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec1 (allow_whitespace)\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse rec1: $diag\n"; } # ---------------- Space-infested tabs, allow_whitespace ---------------- if ($csv->parse($rec2)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec2 (allow_whitespace)\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse rec2: $diag\n"; } # ---------------- Spaces at the end, allow whitespace ---------------- if ($csv->parse($rec3)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec3 (allow_whitespace)\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse rec3: $diag\n"; } # ---------------- Spaces at the front, allow whitespace ---------------- if ($csv->parse($rec4)) { my @fields = $csv->fields; print 'There are ', scalar(@fields), " fields in rec4\n"; } else { my $diag = $csv->error_diag; print "Couldn't parse recf: $diag\n"; }
Subject: Re: [rt.cpan.org #44402] Unexpected results parsing tab-separated spaces
Date: Thu, 19 Mar 2009 08:49:13 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Wed, 18 Mar 2009 14:07:14 -0400, "Eric J. Roode via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> Perhaps this issue is out-of-scope for Text::CSV_XS, because I am using > it to parse tab-separated data, and I do not know whether that is supported.
It should be, and I fixed it for 0.63. Can you grab 'http://repo.or.cz/w/Text-CSV_XS.git?a=snapshot;sf=tgz' and see if that fixes your problem? Show quoted text
> When parsing tab-separated data, and the input record contains fields > that consist entirely of spaces (ascii 32), and allow_whitespace is > enabled, then Text::CSV_XS appears to remove not only the spaces but > also the adjacent tab characters, which has the effect of removing > actual fields from the data record. > > Further, if the record ends with one or more spaces, then the parse > fails completely with the error "EIF - Binary character in unquoted > field, binary off". > > I have attached a sample program that provides eight simple test cases, > four with allow_whitespace on, four with it off. I think it is > self-explanatory.
Your bug report was self-explaining enough already. Better even, I immediately knew where to fix this. BTW Are you the same Eric that worked for PROCURA years ago? -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On Thu Mar 19 03:49:38 2009, h.m.brand@xs4all.nl wrote: Show quoted text
> > It should be, and I fixed it for 0.63. > Can you grab 'http://repo.or.cz/w/Text-CSV_XS.git?a=snapshot;sf=tgz' > and see if that fixes your problem?
It partially solves the problem. It now correctly parses fields that contain only space characters at the start of a line, or in the middle of a line (e.g. surrounded by tabs), but still gives a parse error if there is a spaces-only field at the end of a record. For the test program attached earlier, all tests pass now except the next-to-last one (record 3 with allow-whitespace), which still gives "EIF - Binary character in unquoted field, binary off". Could it be that after it removes spaces at the end of the line, the string pointer has moved to one-character past the end of the string, and is now at a \x00 byte (or some garbage character)? Just a guess (not having looked at the source). Show quoted text
> > BTW Are you the same Eric that worked for PROCURA years ago? >
I'm afraid not, sorry :-) I've never heard of PROCURA. -- Eric Roode
Subject: Re: [rt.cpan.org #44402] Unexpected results parsing tab-separated spaces
Date: Thu, 19 Mar 2009 19:53:19 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 19 Mar 2009 14:06:39 -0400, "Eric J. Roode via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> Queue: Text-CSV_XS > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=44402 > > > On Thu Mar 19 03:49:38 2009, h.m.brand@xs4all.nl wrote:
> > > > It should be, and I fixed it for 0.63. > > Can you grab 'http://repo.or.cz/w/Text-CSV_XS.git?a=snapshot;sf=tgz' > > and see if that fixes your problem?
> > It partially solves the problem. It now correctly parses fields that > contain only space characters at the start of a line, or in the middle > of a line (e.g. surrounded by tabs), but still gives a parse error if > there is a spaces-only field at the end of a record.
I'll have to dig for this. I tested it with $csv->getline (), and not with $csv->parse (). The latter still fails on rec3. Show quoted text
> For the test program attached earlier, all tests pass now except the > next-to-last one (record 3 with allow-whitespace), which still gives > "EIF - Binary character in unquoted field, binary off".
Yes, but only in parse (). I'll come back on this. Show quoted text
> Could it be that after it removes spaces at the end of the line, the > string pointer has moved to one-character past the end of the string, > and is now at a \x00 byte (or some garbage character)? Just a guess > (not having looked at the source).
A much more effective test program doing both parse () and getline (): --8<code>--- #!perl # Test case that demonstrates surprising Text::CSV_XS behavior. use strict; use warnings; use Text::CSV_XS; my @rec = ("", qq{One\t\tThree\t\t\tSix}, # Adjacent tabs qq{One\t \tThree\t \t \tSix}, # Space-infested tabs qq{One\t \tThree\t \t \t }, # Spaces at the end qq{ \t \tThree\t \t \tSix}, # Spaces at the front ); print "We expect 6 fields each time.\n"; foreach my $aw (0, 1) { my $csv = Text::CSV_XS->new ({ sep_char => "\t", allow_whitespace => $aw, }); foreach my $r (1..4) { if ($csv->parse ($rec[$r])) { my @fields = $csv->fields; print STDERR "There are ", scalar @fields, " fields in rec$r ($aw)\n"; } else { print STDERR "# error in rec$r ($aw)\n"; $csv->error_diag; } } } # Now do the same with file/getline open my $fh, ">", "tmp/44402.csv" or die "Cannot write test file: $!\n"; print $fh "$rec[1]\r\n$rec[2]\r\n$rec[3]\r\n$rec[4]\r\n"; close $fh; foreach my $aw (0, 1) { open my $fh, "<", "tmp/44402.csv"; my $csv = Text::CSV_XS->new ({ sep_char => "\t", allow_whitespace => $aw, }); foreach my $r (1..4) { if (my $row = $csv->getline ($fh)) { my @fields = @$row; print STDERR "There are ", scalar @fields, " fields in rec$r ($aw)\n"; } else { print STDERR "# error in rec$r ($aw)\n"; $csv->error_diag; } } } --</code>8--- -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #44402] Unexpected results parsing tab-separated spaces
Date: Thu, 19 Mar 2009 22:21:21 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
Fixed. Can you retry? Grab 'http://repo.or.cz/w/Text-CSV_XS.git?a=snapshot;sf=tgz' -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
On Thu Mar 19 17:21:36 2009, h.m.brand@xs4all.nl wrote: Show quoted text
> > Fixed. Can you retry? > Grab 'http://repo.or.cz/w/Text-CSV_XS.git?a=snapshot;sf=tgz' >
That works! Great, thank you. :-) --Eric