Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 121350
Status: rejected
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: felix.ostmann [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: bind_columns with old values
Missing columns get values from previous rows if the separator is missing SCRIPT: use 5.18.1; use warnings; use Text::CSV_XS; my $data = <<__CSV__; owner,admin 1,1 2 3, 4,4 __CSV__ open my $fh, "<", \$data; my $csv = Text::CSV_XS->new ({ auto_diag => 1 }); my %row; $csv->bind_columns (\@row{@{$csv->getline ($fh)}}); while ($csv->getline ($fh)) { printf "owner: %s ; admin: %s\n", $row{owner}, $row{admin}, ; } OUTPUT: owner: 1 ; admin: 1 owner: 2 ; admin: 1 owner: 3 ; admin: owner: 4 ; admin: 4 I expected for owner 2 a empty admin and not the value from the previous owner.
This is documented behavior. See the last paragraph of the docs for bind_columns. That is why I will close this ticket as invalid. Nothing personal: I understand your expectation. --8<--- bind_columns Takes a list of scalar references to be used for output with "print" or to store in the fields fetched by "getline". When you do not pass enough references to store the fetched fields in, "getline" will fail with error 3006. If you pass more than there are fields to return, the content of the remaining references is left untouched. $csv->bind_columns (\$code, \$name, \$price, \$description); while ($csv->getline ($io)) { print "The price of a $name is \x{20ac} $price\n"; } To reset or clear all column binding, call "bind_columns" with the single argument "undef". This will also clear column names. $csv->bind_columns (undef); If no arguments are passed at all, "bind_columns" will return the list of current bindings or "undef" if no binds are active. Note that in parsing with "bind_columns", the fields are set on the fly. That implies that if the third field of a row causes an error, the first two fields already have been assigned the values of the current row, while the rest will still hold the values of the previous row. -->8--- This behavior is because of speed: 98% of all CSV datasets (estimated guess, it might even be more than that) have rows with the same number of fields for each row. Clearing all remaining fields after error states or incomplete rows will seriously slow down the general case. If you want the behavior you expect, you should create an after_parse hook.
From: felix.ostmann [...] gmail.com
Ok, i understand that approach, but i would suggest the saver way: make the modul work with default options in 100% of input data and for someone who needs the speed add something like a option to disable the save way. In my case it was horrible wrong to have values from old rows and i guess that can hit others easily too.
Subject: Re: [rt.cpan.org #121350] bind_columns with old values
Date: Tue, 25 Apr 2017 12:03:02 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
Show quoted text
> Ok, i understand that approach, but i would suggest the saver way: > make the modul work with default options in 100% of input data and > for someone who needs the speed add something like a option to > disable the save way.
Absolutely not. The way it currently works *is* safe. I *know* of processes that *depend* on this behavior. Show quoted text
> In my case it was horrible wrong to have values from old rows and i > guess that can hit others easily too.
In that case, you might be (very) interested in the new strict attribute that just got pushed to github. You can checkout and test if that fits your needs. This option was on my TODO list for a long time, but your ticket triggered me to actually implement and test it $ cat rt121350.pl use 5.18.1; use warnings; use Text::CSV_XS; my $data = <<"__CSV__"; owner,admin 1,1 2 3, 4,4 __CSV__ open my $fh, "<", \$data; my $csv = Text::CSV_XS->new ({ auto_diag => 1, strict => 1 }); my %row; $csv->bind_columns (\@row{@{$csv->getline ($fh)}}); while ($csv->getline ($fh)) { printf "owner: %s ; admin: %s\n", $row{owner}, $row{admin}; } $ perl rt121350.pl owner: 1 ; admin: 1 # CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 3 pos 3 field 1 -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.25 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.

From: felix.ostmann [...] gmail.com
Am Di 25. Apr 2017, 06:04:44, h.m.brand@xs4all.nl schrieb: Show quoted text
> > Ok, i understand that approach, but i would suggest the saver way: > > make the modul work with default options in 100% of input data and > > for someone who needs the speed add something like a option to > > disable the save way.
> > Absolutely not. The way it currently works *is* safe. I *know* of > processes that *depend* on this behavior.
OK, this is a point where religion wars can start ... I get input from customers and end up with wrong data. In my opinion a good modul should save me with the default options. I see that at this point it is not possible to switch that anymore (because somone might depend on the current behavior). Show quoted text
>
> > In my case it was horrible wrong to have values from old rows and i > > guess that can hit others easily too.
> > In that case, you might be (very) interested in the new strict > attribute that just got pushed to github. You can checkout and test if > that fits your needs. This option was on my TODO list for a long time, > but your ticket triggered me to actually implement and test it
yeah, works as expected and is my favorite beloved possibility to deal with ambignious data :) don't accept it! If only this could be the default behavior ... ;) Show quoted text
> > $ cat rt121350.pl > use 5.18.1; > use warnings; > use Text::CSV_XS; > > my $data = <<"__CSV__"; > owner,admin > 1,1 > 2 > 3, > 4,4 > __CSV__ > > open my $fh, "<", \$data; > > my $csv = Text::CSV_XS->new ({ auto_diag => 1, strict => 1 }); > my %row; > $csv->bind_columns (\@row{@{$csv->getline ($fh)}}); > while ($csv->getline ($fh)) { > printf "owner: %s ; admin: %s\n", $row{owner}, $row{admin}; > } > $ perl rt121350.pl > owner: 1 ; admin: 1 > # CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 3 pos > 3 field 1
Subject: Re: [rt.cpan.org #121350] bind_columns with old values
Date: Tue, 25 Apr 2017 14:30:02 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Tue, 25 Apr 2017 07:27:29 -0400, "Felix Antonius Wilhelm Ostmann via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> Queue: Text-CSV_XS > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=121350 > > > Am Di 25. Apr 2017, 06:04:44, h.m.brand@xs4all.nl schrieb:
> > > Ok, i understand that approach, but i would suggest the saver way: > > > make the modul work with default options in 100% of input data and > > > for someone who needs the speed add something like a option to > > > disable the save way.
> > > > Absolutely not. The way it currently works *is* safe. I *know* of > > processes that *depend* on this behavior.
> > OK, this is a point where religion wars can start ... > > I get input from customers and end up with wrong data. In my opinion > a good modul should save me with the default options.
A good module offers not only basic functionality, but also a test suite and documentation that covers all options. Most options are implemented as attributes and stem from direct wishes from users. They were not all a result of my devious mind. (Some are though) Keep in mind that this module is written with SPEED (yes, in capitals) is a primary goal (after correctness of course). If *you* choose to use bind_column because of speed (which you should, as it can cause a lot of speed-up), you inherently choose to take the side-effects that come with it (and are documented). If you would not use bind_columns, you could check the size of the row inside your parsing loop. If you choose to use bind_columns, with the current implementation and documented side-effects, you can not state that it has erroneous behavior. There are (many) alternatives to work around that issue. Show quoted text
> I see that at this point it is not possible to switch that anymore > (because somone might depend on the current behavior).
not might. I'm very sure about it. Even if it were possible, I would not. It is not a rare case that data is set up so that trailing columns are expected to be copies of the row above. Show quoted text
> > > In my case it was horrible wrong to have values from old rows and > > > i guess that can hit others easily too.
> > > > In that case, you might be (very) interested in the new strict > > attribute that just got pushed to github. You can checkout and test > > if that fits your needs. This option was on my TODO list for a long > > time, but your ticket triggered me to actually implement and test > > it
> > yeah, works as expected and is my favorite beloved possibility to > deal with ambignious data :) don't accept it! If only this could be > the default behavior ... ;)
The only default behavior that I seriously want to change - but cannot due to backward incompatibilities - is the default for the binary attribute. That should have been default from the start. But alas, I did not write the first version. As you will probably need a default list of attributes like binary and auto_diag, having strict in there as well shouldn't be too much of a strain my $csv = Text::CSV_XS->new ({ binary => 1, strict => 1, auto_diag => 1 }); FWIW I also now implemented the same option in Text::CSV for perl6, where it *also* defaults to False. The invocation is simpler though, as in the perl6 module I was able to make binary the default my $csv = Text::CSV.new (:strict, :auto-diag); I'll start the test phase and will try to get out this new version asap for you. Show quoted text
> > $ cat rt121350.pl > > use 5.18.1; > > use warnings; > > use Text::CSV_XS; > > > > my $data = <<"__CSV__"; > > owner,admin > > 1,1 > > 2 > > 3, > > 4,4 > > __CSV__ > > > > open my $fh, "<", \$data; > > > > my $csv = Text::CSV_XS->new ({ auto_diag => 1, strict => 1 }); > > my %row; > > $csv->bind_columns (\@row{@{$csv->getline ($fh)}}); > > while ($csv->getline ($fh)) { > > printf "owner: %s ; admin: %s\n", $row{owner}, $row{admin}; > > } > > $ perl rt121350.pl > > owner: 1 ; admin: 1 > > # CSV_XS ERROR: 2014 - ENF - Inconsistent number of fields @ rec 3 > > pos 3 field 1
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.25 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.