Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 130579
Status: rejected
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: m.mokotov [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Using getline() after bind_columns() returns undef even though data exists
Date: Wed, 25 Sep 2019 15:59:36 -0500
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "Mickey M." <m.mokotov [...] gmail.com>
Hi, We use Text-CSV::XS for a while and it's awesome. Recently we encounter a weird behavior. Using getline() after bind_columns() returns undef for specific parsed data. We isolated the case to something reproducible, here it is: *Env:* $ sudo cpanm Text::CSV_XS Text::CSV_XS is up to date. (1.40) $ perl -v This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi *Code:* my $row = {}; my $reader = new Text::CSV_XS( { binary => 1, sep_char => "|" } ); $reader->bind_columns( \@{ $row }{ qw( a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26 a27 a28 a29 a30 a31 a32 a33 a34 a35 a36 a37 a38 a39 ) } ); use Data::Dumper; print Dumper( $reader->getline( *STDIN ) ); print Dumper( $row ); *Data (note the " on val10):* val1|val2|val3|val4|val5|val6|val7|val8|val9|val10"|val11|val12|val13|val14|val15|val16|val17|val18|val19|val20|val21|val22|val23|val24|val25|val26|val27|val28|val29|val30|val31|val32|val33|val34|val35|val36|val37|val38|val39 Thanks, Mickey.
I think I would have to mark this as not a bug. You might assume that choosing the '|' as separator changes more than just the separator, but the '"' in the data is a parse error, which you would have seen if you also enabled the diagnostics: --8<--- use Text::CSV_XS; my @hdr = map { "a$_" } 1..39; my $row = {}; my $reader = new Text::CSV_XS ({ binary => 1, sep_char => "|", auto_diag => 1 }); $reader->bind_columns (\@{$row}{@hdr}); $reader->getline (*DATA); __END__ val1|val2|val3|val4|val5|val6|val7|val8|val9|val10"|val11|val12|val13|val14|val15|val16|val17|val18|val19|val20|val21|val22|val23|val24|val25|val26|val27|val28|val29|val30|val31|val32|val33|val34|val35|val36|val37|val38|val39 -->8--- will show you --8<--- # CSV_XS ERROR: 2034 - EIF - Loose unescaped quote @ rec 0 pos 51 field 10 -->8--- If you want the '"' to be valid part of the data, you need to add the "allow_loose_quotes" attribute: --8<--- my $reader = new Text::CSV_XS ({ binary => 1, sep_char => "|", allow_loose_quotes => 1, auto_diag => 1, }); -->8---
Subject: Re: [rt.cpan.org #130579] Using getline() after bind_columns() returns undef even though data exists
Date: Thu, 26 Sep 2019 09:21:35 -0500
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "Mickey M." <m.mokotov [...] gmail.com>
Hi and thanks for the quick reply! It's of course your call to decide if this is a bug or not. From a user perspective, it's usually expected that a configuration change will only affect the specific changed configuration and not other behaviors. That said, your library is amazing 👍 Take care and again thanks for the help, Mickey. On Thu, Sep 26, 2019 at 2:34 AM H.Merijn Brand via RT < bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=130579 > > > I think I would have to mark this as not a bug. You might assume that > choosing the '|' as separator changes more than just the separator, but the > '"' in the data is a parse error, which you would have seen if you also > enabled the diagnostics: > > --8<--- > use Text::CSV_XS; > > my @hdr = map { "a$_" } 1..39; > my $row = {}; > my $reader = new Text::CSV_XS ({ binary => 1, sep_char => "|", auto_diag > => 1 }); > $reader->bind_columns (\@{$row}{@hdr}); > $reader->getline (*DATA); > __END__ > > val1|val2|val3|val4|val5|val6|val7|val8|val9|val10"|val11|val12|val13|val14|val15|val16|val17|val18|val19|val20|val21|val22|val23|val24|val25|val26|val27|val28|val29|val30|val31|val32|val33|val34|val35|val36|val37|val38|val39 > -->8--- > > will show you > > --8<--- > # CSV_XS ERROR: 2034 - EIF - Loose unescaped quote @ rec 0 pos 51 field 10 > -->8--- > > If you want the '"' to be valid part of the data, you need to add the > "allow_loose_quotes" attribute: > > --8<--- > my $reader = new Text::CSV_XS ({ > binary => 1, > sep_char => "|", > allow_loose_quotes => 1, > auto_diag => 1, > }); > -->8--- >
-- Thanks, Mickey. -- This email is short to save you time.
Subject: Re: [rt.cpan.org #130579] Using getline() after bind_columns() returns undef even though data exists
Date: Thu, 26 Sep 2019 17:01:11 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 26 Sep 2019 10:22:08 -0400, "Mickey M. via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> Hi and thanks for the quick reply! > > It's of course your call to decide if this is a bug or not. From a > user perspective, it's usually expected that a configuration change > will only affect the specific changed configuration and not other > behaviors. > > That said, your library is amazing 👍
It is definitely not a bug, and it it explicitly documented what the current (expected) behavior is: bind_columns Takes a list of scalar references to be used for output with "print" or to store in the fields fetched by "getline". When you do not pass enough references to store the fetched fields in, "getline" will fail with error 3006. If you pass more than there are fields to return, the content of the remaining references is left untouched. So, here you passed 32 entries to the bind_columns col, so any getline will on-the-fly fill the bound fields. In the posted case, you tell the parser to store its fields in @{$row}{"a01".."a31"}. There has no data been parsed yet, so there is no error or ambiguity. Now you call getline on an erroneous CSV data line: the " is not allowed in the specs. As CSV is parsed on a byte-by-byte or character-to-character basis to enable streaming, the first 9 fields are perfectly fine. Then a new unquoted field starts, and - as you bound the columns, will *directly* be stored in your hash, character by character until it hits an error and stops. Using error_diag => 1 would have shown you right away. If in your dataset, the " are completely meaningless, which implies that embedded newlines or separation characters are not possible, there are two ways to get around that problem: 1. Allow these exceptions allow_loose_quotes => 1, allow_loose_escapes => 1, 2. Work without these special characters quote_char => undef, escape_char => undef, It is up to you to decide what option is best -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.31 porting perl5 on HP-UX, AIX, and Linux https://useplaintext.email https://tux.nl http://www.test-smoke.org http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 488b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #130579] Using getline() after bind_columns() returns undef even though data exists
Date: Thu, 3 Oct 2019 14:28:01 -0500
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Mickey <m.mokotov [...] gmail.com>
Thank you for the complete answer! Thanks, Mickey. -- This email is short to save you time. On 09/26/2019 10:01 AM, h.m.brand@xs4all.nl via RT wrote: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=130579 > > > On Thu, 26 Sep 2019 10:22:08 -0400, "Mickey M. via RT" > <bug-Text-CSV_XS@rt.cpan.org> wrote: >
>> Hi and thanks for the quick reply! >> >> It's of course your call to decide if this is a bug or not. From a >> user perspective, it's usually expected that a configuration change >> will only affect the specific changed configuration and not other >> behaviors. >> >> That said, your library is amazing 👍
> It is definitely not a bug, and it it explicitly documented what the > current (expected) behavior is: > > bind_columns > Takes a list of scalar references to be used for output with "print" or > to store in the fields fetched by "getline". When you do not pass enough > references to store the fetched fields in, "getline" will fail with > error 3006. If you pass more than there are fields to return, the > content of the remaining references is left untouched. > > So, here you passed 32 entries to the bind_columns col, so any getline > will on-the-fly fill the bound fields. > > In the posted case, you tell the parser to store its fields in > @{$row}{"a01".."a31"}. There has no data been parsed yet, so there is > no error or ambiguity. Now you call getline on an erroneous CSV data > line: the " is not allowed in the specs. As CSV is parsed on a > byte-by-byte or character-to-character basis to enable streaming, the > first 9 fields are perfectly fine. Then a new unquoted field starts, > and - as you bound the columns, will *directly* be stored in your hash, > character by character until it hits an error and stops. Using > error_diag => 1 would have shown you right away. > > If in your dataset, the " are completely meaningless, which implies > that embedded newlines or separation characters are not possible, there > are two ways to get around that problem: > > 1. Allow these exceptions > > allow_loose_quotes => 1, > allow_loose_escapes => 1, > > 2. Work without these special characters > > quote_char => undef, > escape_char => undef, > > It is up to you to decide what option is best >