On Tue, 14 Oct 2008 17:43:51 -0400, "Evan Carroll via RT"
<bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text> I think that this is slightly deceptive
> my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
Hey, that line is from the previous example. Don't mix parsing and
generation. Parsing doesn't need eol (though it is allowed)
Show quoted text> open my $io, "<", $file or die "$file: $!";
> while (my $row = $csv->getline ($io)) {
> my @fields = @$row;
Maybe more 'generic' than 'deceptive'.
Show quoted text> Because sometimes minor errors, like 2023, 2034 which only render one
> row void will cause the whole app to cease.
True. But remember that eof only works on streams. In this example, it
might be the better way to do, but not using your example.
Show quoted text> And, that isn't especially clear in the docs. It seems like ->eof was
> created to give us something useful to test against. It should be made
> clear in the docs, that if you loop over testing for eof you can deal
> with errors, if you loop over testing for errors you'll have to exclude
> EOF which is a type of error.
Yes, agree, more or less.
If one expects getline () to be true for parse success, eof *is* a
parse error. It also indicates the end-of-stream, which is a signal to
the parsing process that it is safe to stop.
Show quoted text> And, if you loop through as you do in the docs, testing for a row,
> you'll finish on the *first* error or EOF which probably is never wanted.
>
> I think something like this is more useful:
>
> my $fh = IO::File->new('file.csv', 'r');
I'm opposed to promoting other modules, like IO::File or IO::Handle
from within the documentation. Not that I have anything against them,
but IMHO the docs should restrict to Text::CSV_XS unless absolutely
needed to explain, like with autoloading IO::Handle
while (not $csv->eof) {
if (my $row = $csv->getline_hr ($fh)) {
# do something useful
}
else {
$csv->eof or $csv->error_diag;
}
}
You can't always continue after a parsing error. If a parse was broken,
the next parse is very likely to be broken too. SetDiag (0) is almost
always a bad idea, unless you specifically test for specific errors
that you expect.
my $csv = Text::CSV_XS->new ({ binary => 1, eol => $/ });
open my $io, "<", $file or die "$file: $!";
while (my $row = $csv->getline ($io)) {
my @fields = @$row;
# Do something with fields
}
$csv->eof or $csv->error_diag;
Would seem to me the ultimate clear extension to the documentation to
be a catch-all including your griefs. (Now added, see below)
So, if your CSV is allowed to have empty lines, and/or you expect
certain errors, your suggested approach is fine, but in the generic
case, I think that the false return of getline () is enough reason to
stop parsing. parse-xs.pl has special rules for errors 2010, 2027,
2031, and 2032. If you have a good suggestion for 2023 and 2034, I will
consider adding those to parse-xs.pl example.
Show quoted text> Thanks again for Text::CSV_XS (this just can give you some grief). I
> think an alt route would be use a tied hash that evaluates to true in
> scalar context and stores the meta data about the errors inside.
The alternate route is to fortify examples/parse-xs.pl, which uses this:
print STDERR "Reading $file with Text::CSV_XS $Text::CSV_XS::VERSION\n";
while (1) {
my $row = $csv->getline ($fh);
unless ($row) { # Parsing failed
# Could be end of file
$csv->eof and last;
# Diagnose and show what was wrong
my @diag = $csv->error_diag;
print STDERR "$file line $./$diag[2] - $diag[0] - $diag[1]\n";
So, I'll have another look at a possible extension to the docs you
refer to, and point to parse-xs.pl for safe parsing with continuation
possibilities.
parse-xs.pl parses with *two* parsers in the same stream.
--8<--- doc change
index a1081e7..e7f52d8 100644
--- a/CSV_XS.pm
+++ b/CSV_XS.pm
@@ -1244,11 +1244,12 @@ Reading a CSV file line by line:
while (my $row = $csv->getline ($fh)) {
# do something with @$row
}
- close $fh or die "file.csv: $!";;
+ $csv->eof or $csv->error_diag;
+ close $fh or die "file.csv: $!";
For more extended examples, see the C<examples/> subdirectory in the
original distribution. Included is C<examples/parser-xs.pl>, that could
-be used to `fix' bad CSV
+be used to `fix' bad CSV and parse beyond errors.
perl examples/parser-xs.pl bad.csv >good.csv
-->8---
--
H.Merijn Brand Amsterdam Perl Mongers
http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.2, and 10.3, AIX 5.2, and Cygwin.
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/