Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 40047
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: me [...] evancarroll.com
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: 0.54



Subject: doc idea
I think that this is slightly deceptive my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); open my $io, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($io)) { my @fields = @$row; Because sometimes minor errors, like 2023, 2034 which only render one row void will cause the whole app to cease. And, that isn't especially clear in the docs. It seems like ->eof was created to give us something useful to test against. It should be made clear in the docs, that if you loop over testing for eof you can deal with errors, if you loop over testing for errors you'll have to exclude EOF which is a type of error. And, if you loop through as you do in the docs, testing for a row, you'll finish on the *first* error or EOF which probably is never wanted. I think something like this is more useful: my $fh = IO::File->new('file.csv', 'r'); while ( not $csv->eof ) { my $row = $csv->getline_hr($fh); if ( $csv->error_diag ) { last if $csv->eof; ## test again since last read. printf ( "ErrorCode: [%s], Meaning: [%s]\n", $csv->error_diag ); $csv->SetDiag(0); next; } Thanks again for Text::CSV_XS (this just can give you some grief). I think an alt route would be use a tied hash that evaluates to true in scalar context and stores the meta data about the errors inside.
Subject: Re: [rt.cpan.org #40047] doc idea
Date: Wed, 15 Oct 2008 11:54:14 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Tue, 14 Oct 2008 17:43:51 -0400, "Evan Carroll via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> I think that this is slightly deceptive > my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
Hey, that line is from the previous example. Don't mix parsing and generation. Parsing doesn't need eol (though it is allowed) Show quoted text
> open my $io, "<", $file or die "$file: $!"; > while (my $row = $csv->getline ($io)) { > my @fields = @$row;
Maybe more 'generic' than 'deceptive'. Show quoted text
> Because sometimes minor errors, like 2023, 2034 which only render one > row void will cause the whole app to cease.
True. But remember that eof only works on streams. In this example, it might be the better way to do, but not using your example. Show quoted text
> And, that isn't especially clear in the docs. It seems like ->eof was > created to give us something useful to test against. It should be made > clear in the docs, that if you loop over testing for eof you can deal > with errors, if you loop over testing for errors you'll have to exclude > EOF which is a type of error.
Yes, agree, more or less. If one expects getline () to be true for parse success, eof *is* a parse error. It also indicates the end-of-stream, which is a signal to the parsing process that it is safe to stop. Show quoted text
> And, if you loop through as you do in the docs, testing for a row, > you'll finish on the *first* error or EOF which probably is never wanted. > > I think something like this is more useful: > > my $fh = IO::File->new('file.csv', 'r');
I'm opposed to promoting other modules, like IO::File or IO::Handle from within the documentation. Not that I have anything against them, but IMHO the docs should restrict to Text::CSV_XS unless absolutely needed to explain, like with autoloading IO::Handle while (not $csv->eof) { if (my $row = $csv->getline_hr ($fh)) { # do something useful } else { $csv->eof or $csv->error_diag; } } You can't always continue after a parsing error. If a parse was broken, the next parse is very likely to be broken too. SetDiag (0) is almost always a bad idea, unless you specifically test for specific errors that you expect. my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ }); open my $io, "<", $file or die "$file: $!"; while (my $row = $csv->getline ($io)) { my @fields = @$row; # Do something with fields } $csv->eof or $csv->error_diag; Would seem to me the ultimate clear extension to the documentation to be a catch-all including your griefs. (Now added, see below) So, if your CSV is allowed to have empty lines, and/or you expect certain errors, your suggested approach is fine, but in the generic case, I think that the false return of getline () is enough reason to stop parsing. parse-xs.pl has special rules for errors 2010, 2027, 2031, and 2032. If you have a good suggestion for 2023 and 2034, I will consider adding those to parse-xs.pl example. Show quoted text
> Thanks again for Text::CSV_XS (this just can give you some grief). I > think an alt route would be use a tied hash that evaluates to true in > scalar context and stores the meta data about the errors inside.
The alternate route is to fortify examples/parse-xs.pl, which uses this: print STDERR "Reading $file with Text::CSV_XS $Text::CSV_XS::VERSION\n"; while (1) { my $row = $csv->getline ($fh); unless ($row) { # Parsing failed # Could be end of file $csv->eof and last; # Diagnose and show what was wrong my @diag = $csv->error_diag; print STDERR "$file line $./$diag[2] - $diag[0] - $diag[1]\n"; So, I'll have another look at a possible extension to the docs you refer to, and point to parse-xs.pl for safe parsing with continuation possibilities. parse-xs.pl parses with *two* parsers in the same stream. --8<--- doc change index a1081e7..e7f52d8 100644 --- a/CSV_XS.pm +++ b/CSV_XS.pm @@ -1244,11 +1244,12 @@ Reading a CSV file line by line: while (my $row = $csv->getline ($fh)) { # do something with @$row } - close $fh or die "file.csv: $!";; + $csv->eof or $csv->error_diag; + close $fh or die "file.csv: $!"; For more extended examples, see the C<examples/> subdirectory in the original distribution. Included is C<examples/parser-xs.pl>, that could -be used to `fix' bad CSV +be used to `fix' bad CSV and parse beyond errors. perl examples/parser-xs.pl bad.csv >good.csv -->8--- -- H.Merijn Brand Amsterdam Perl Mongers http://amsterdam.pm.org/ using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00, 11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin. http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/