Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 80629
Status: rejected
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: adamk [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: Access to fields by reference
The ->fields method currently returns a list of fields. When I benchmark an application of mine that consumes CSV files with 1000-2000 fields, the cost of moving the fields across the method boundary and turning them back into a reference again is as high as the cost of the call to ->parse. Since the ->parse method replaces the _FIELDS anyway, I think it would be a good idea to add a ->fields_ar method to return _FIELDS directly. This would literally double the speed of parsing for my field-heavy application (which is common in machine learning type applications).
Subject: Re: [rt.cpan.org #80629] Access to fields by reference
Date: Mon, 5 Nov 2012 10:45:05 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sun, 4 Nov 2012 23:32:39 -0500, "Adam Kennedy via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> The ->fields method currently returns a list of fields.
With better/faster access methods available, I almost never use ->fields () at all, and I use this module on a daily basis Show quoted text
> When I benchmark an application of mine that consumes CSV files with > 1000-2000 fields, the cost of moving the fields across the method > boundary and turning them back into a reference again is as high as the > cost of the call to ->parse.
Ever looked into bind_columns ()? esp when using wide records, this sums up fast: http://tux.nl/Talks/CSV/csv3d.html http://tux.nl/Talks/CSV/csv3e.html Parsing wide records is probably done fastest with something like my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); open my $fh, "<", $filename or die "$filename: $!\n"; my @hdr = @{$csv->getline ($fh)}; my %rec; $csv->bind_columns (\@rec{@hdr}); while ($csv->getline ()) { # %rec has now filled 2000 fields ... } Show quoted text
> Since the ->parse method replaces the _FIELDS anyway, I think it would > be a good idea to add a ->fields_ar method to return _FIELDS directly.
If you have a (very) valid reason still to use ->parse (), I could consider a patch Show quoted text
> This would literally double the speed of parsing for my field-heavy > application (which is common in machine learning type applications).
I see no added value (yet), as parse/fields is way too slow anyway (compared to the safer getline approach. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Better and faster access methods already exist. Feel free to re-open if you have good arguments against using the faster methods