Bug #80629 for Text-CSV_XS: Access to fields by reference

Sun Nov 04 23:32:38 2012 adamk [...] cpan.org - Ticket created

Subject:

Access to fields by reference

The ->fields method currently returns a list of fields. When I benchmark an application of mine that consumes CSV files with 1000-2000 fields, the cost of moving the fields across the method boundary and turning them back into a reference again is as high as the cost of the call to ->parse. Since the ->parse method replaces the _FIELDS anyway, I think it would be a good idea to add a ->fields_ar method to return _FIELDS directly. This would literally double the speed of parsing for my field-heavy application (which is common in machine learning type applications).

Mon Nov 05 04:45:27 2012 h.m.brand [...] xs4all.nl - Correspondence added

Subject:	Re: [rt.cpan.org #80629] Access to fields by reference
Date:	Mon, 5 Nov 2012 10:45:05 +0100
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	"H.Merijn Brand" <h.m.brand [...] xs4all.nl>

On Sun, 4 Nov 2012 23:32:39 -0500, "Adam Kennedy via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text

> The ->fields method currently returns a list of fields.

With better/faster access methods available, I almost never use ->fields () at all, and I use this module on a daily basis Show quoted text

> When I benchmark an application of mine that consumes CSV files with > 1000-2000 fields, the cost of moving the fields across the method > boundary and turning them back into a reference again is as high as the > cost of the call to ->parse.

Ever looked into bind_columns ()? esp when using wide records, this sums up fast: http://tux.nl/Talks/CSV/csv3d.html http://tux.nl/Talks/CSV/csv3e.html Parsing wide records is probably done fastest with something like my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); open my $fh, "<", $filename or die "$filename: $!\n"; my @hdr = @{$csv->getline ($fh)}; my %rec; $csv->bind_columns (\@rec{@hdr}); while ($csv->getline ()) { # %rec has now filled 2000 fields ... } Show quoted text

> Since the ->parse method replaces the _FIELDS anyway, I think it would > be a good idea to add a ->fields_ar method to return _FIELDS directly.

If you have a (very) valid reason still to use ->parse (), I could consider a patch Show quoted text

> This would literally double the speed of parsing for my field-heavy > application (which is common in machine learning type applications).

I see no added value (yet), as parse/fields is way too slow anyway (compared to the safer getline approach. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.17 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Mon Nov 05 04:45:28 2012 The RT System itself - Status changed from 'new' to 'open'

Tue Nov 13 04:45:09 2012 HMBRAND [...] cpan.org - Severity Wishlist added

Wed Nov 28 02:51:21 2012 HMBRAND [...] cpan.org - Correspondence added

Better and faster access methods already exist. Feel free to re-open if you have good arguments against using the faster methods

Wed Nov 28 02:51:22 2012 HMBRAND [...] cpan.org - Status changed from 'open' to 'rejected'