On Sun, 4 Nov 2012 23:32:39 -0500, "Adam Kennedy via RT"
<bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text> The ->fields method currently returns a list of fields.
With better/faster access methods available, I almost never use
->fields () at all, and I use this module on a daily basis
Show quoted text> When I benchmark an application of mine that consumes CSV files with
> 1000-2000 fields, the cost of moving the fields across the method
> boundary and turning them back into a reference again is as high as the
> cost of the call to ->parse.
Ever looked into bind_columns ()?
esp when using wide records, this sums up fast:
http://tux.nl/Talks/CSV/csv3d.html
http://tux.nl/Talks/CSV/csv3e.html
Parsing wide records is probably done fastest with something like
my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<", $filename or die "$filename: $!\n";
my @hdr = @{$csv->getline ($fh)};
my %rec;
$csv->bind_columns (\@rec{@hdr});
while ($csv->getline ()) {
# %rec has now filled 2000 fields
...
}
Show quoted text> Since the ->parse method replaces the _FIELDS anyway, I think it would
> be a good idea to add a ->fields_ar method to return _FIELDS directly.
If you have a (very) valid reason still to use ->parse (), I could
consider a patch
Show quoted text> This would literally double the speed of parsing for my field-heavy
> application (which is common in machine learning type applications).
I see no added value (yet), as parse/fields is way too slow anyway
(compared to the safer getline approach.
--
H.Merijn Brand
http://tux.nl Perl Monger
http://amsterdam.pm.org/
using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/