Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 34474
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: HMBRAND [...] cpan.org
Requestors: MARKSTOS [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 0.37
Fixed in: 0.42



Subject: wish: integrate row-as-hashref feature from Parse::CSV
Merijn, Thanks for your careful maintainership of Text::CSV_XS. It is an important, widely used module and I appreciate your help with it. There is perhaps only one important feature I think it lacks, and it is already mentioned on the "TODO" list: That's the ability to get a row back has a hashref. Parse::CSV has now been published and has code for this which can be borrowed, so that something like "getline_href" can be added. Currently Parse::CSV has a couple of weaknesses which prevent it from being useful as a wrapper: - It hardcoded all of the Text::CVS options to new, so new options added by Text::CSV_XS like blank_is_undef are not automatically supported. - It does not use "getline()", so binary mode is not supported. While Parse::CSV could also be patched to fix these things, it strikes me as a better solution for Text::CSV_XS to borrow the one great feature that it adds. I have some potential interest helping with this, since I have a current desire for being able to parse CSV files with embedded newlines and get a hashref back.
I first thought that that could be done rather simple in-line, but it was rather easy to implement. Would you be happy with this: use Text::CSV_XS; my $csv = Text::CSV_XS->new (); $csv->set_keys (qw( code name price description )); while (my $hr = $csv->getline_hr (*IO)) { print "The price of a $hr->{name} is $hr->{price} \x{20ac}\n"; } or, if the first line has the field names $csv->set_keys (@{$csv->getline (*IO)}); give or take a few error checks
Subject: Re: [rt.cpan.org #34474] wish: integrate row-as-hashref feature from Parse::CSV
Date: Thu, 27 Mar 2008 13:01:52 -0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
HMBRAND via RT wrote: Show quoted text
> > I first thought that that could be done rather simple in-line, but it > was rather easy to implement. Would you be happy with this: > > use Text::CSV_XS; > > my $csv = Text::CSV_XS->new (); > $csv->set_keys (qw( code name price description )); > while (my $hr = $csv->getline_hr (*IO)) { > print "The price of a $hr->{name} is $hr->{price} \x{20ac}\n"; > } > > or, if the first line has the field names > > $csv->set_keys (@{$csv->getline (*IO)}); > > give or take a few error checks
Thanks for the fast response! What you suggest would be sufficient, but I would suggest the follow syntax improvements. I prefer the Parse::CSV options for declaring what the fields name are: ...->new( fields => \@fields ); or: ...->new( fields => 'auto', ); I also have one more suggestion, which is unrelated: It would be nice if functions were tagged with "=head2" or "=head3" in the POD. This would make the docs easier to browse on search.cpan.org and perhaps through other tools. Mark -- . . . . . . . . . . . . . . . . . . . . . . . . . . . Mark Stosberg Principal Developer mark@summersault.com Summersault, LLC 765-939-9301 ext 202 database driven websites . . . . . http://www.summersault.com/ . . . . . . . .
I understand your wish, but I will not honour that. I'll explain 1. 'fields' as an attribute would be confusing, as it would be completely unrelated to the existing fields () method. 2. passing field names to the constructor will be the odd-one-out When you read CSV files where the first line defines the column names, it has to use $csv to read the names, which makes this a chicken-and- egg problem. 3. 'auto' is not an option, as Text::CSV_XS does not have a memory about previous lines, unless you want the first call to getline_hr () return the column names *AND* put them in the header if that is still undefined. Too many open questions IMHO. I renamed set_keys () to hr_keys (), so I can also query them. I'll look at the pod stuff too.
Subject: Re: [rt.cpan.org #34474] wish: integrate row-as-hashref feature from Parse::CSV
Date: Thu, 27 Mar 2008 15:59:01 -0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
HMBRAND via RT wrote: Show quoted text
> > I understand your wish, but I will not honour that. I'll explain > > 1. 'fields' as an attribute would be confusing, as it would be > completely unrelated to the existing fields () method. > 2. passing field names to the constructor will be the odd-one-out > When you read CSV files where the first line defines the column names, > it has to use $csv to read the names, which makes this a chicken-and- > egg problem. > 3. 'auto' is not an option, as Text::CSV_XS does not have a memory > about previous lines, unless you want the first call to getline_hr () > return the column names *AND* put them in the header if that is still > undefined. Too many open questions IMHO.
Merijn, your line of thinking sounds very reasonable. Thanks for considering this. Show quoted text
> I renamed set_keys () to hr_keys (), so I can also query them.
Querying them will perhaps be useful, but I think the "hr_keys" name could be improved. I suggest "column_names". I wouldn't immediately think that "hr" meant "hashref". Just as I wrote that I realize that perhaps it means "header row"! Show quoted text
> I'll look at the pod stuff too.
Thank you! Mark
Subject: PATCH: getline_hr() and column_names()
From: MARKSTOS [...] cpan.org
Merijn, I needed this functionality and went ahead and wrote my own patch for it, although I later found your GIT respository, where you had also written some patch for this. ( http://repo.or.cz/w/Text-CSV_XS.git?a=shortlog ) I've compared our work, and would like to point out some features of mine: - It uses "column_names", which I think is a clearer name. - It allows the names to be set from an array or an arrayref. The array would be clearer if setting the names by hand, an the arrayref is is nice because it is what getlines() returns. - I include error checking for a number of cases including: - invalid input to column_names() - calling getlines_hr() before column_names() - mismatching the number of colum names and values My patch also includes tests and documentation updates. The one thing in your patch I was curious about was the stub mention in one of the XS files. My implementation doesn't use such a stub, although it appears to function fine with one. Since there are a lot of Text::CSV_XS users, I may get some feedback from some others on the alternatives. Mark
diff -rN -u old-Text-CSV_XS-0.37/ChangeLog new-Text-CSV_XS-0.37/ChangeLog --- old-Text-CSV_XS-0.37/ChangeLog 2008-04-03 13:13:30.000000000 -0400 +++ new-Text-CSV_XS-0.37/ChangeLog 2008-04-03 13:13:30.000000000 -0400 @@ -1,3 +1,7 @@ + + * Added getline_hr() and column_names() to support returning + rows as hashrefs (Mark Stosberg, H.Merijn Brand) + 2008-03-11 0.37 - H.Merijn Brand <h.m.brand@xs4all.nl> * Copied GIT repo to public mirror diff -rN -u old-Text-CSV_XS-0.37/CSV_XS.pm new-Text-CSV_XS-0.37/CSV_XS.pm --- old-Text-CSV_XS-0.37/CSV_XS.pm 2008-04-03 13:13:30.000000000 -0400 +++ new-Text-CSV_XS-0.37/CSV_XS.pm 2008-04-03 13:13:30.000000000 -0400 @@ -380,6 +380,62 @@ $self->{_STATUS}; } # parse +sub column_names { + my $self = shift; + my ($first,@rest) = @_; + my @cols; + # First use? Initialize. + $self->{_COLUMN_NAMES} ||= []; + + # Error checking and unpacking @_ + if (ref $first eq 'ARRAY') { + if (scalar @rest) { + die "column_names() expects only one arg when given an arrayref" + } + else { + $self->{_COLUMN_NAMES} = $first; + } + } + elsif (ref $first) { + die "column_names() was passed a reference which was not of type ARRAY"; + } + # Array of column names expected + elsif ((defined $first) or (scalar @rest)) { + @cols = ($first,@rest); + $self->{_COLUMN_NAMES} = \@cols; + } + else { + # No input processing! + } + + return @{ $self->{_COLUMN_NAMES} }; +} + +sub getline_hr { + my $self = shift; + my $io = shift; + + unless (defined $self->{_COLUMN_NAMES}) { + die "You must call column_names before calling getline_hr"; + } + + my $vals = $self->getline($io); + my @keys = $self->column_names; + + my $n_vals = scalar @$vals; + my $n_keys = scalar @keys; + unless ($n_keys == $n_vals) { + die "Number of column_names ($n_keys) doesn't match the number of entries in this row ($n_vals)" + } + + my %row; + @row{@keys} = @$vals; + return \%row ; +} + + + + bootstrap Text::CSV_XS $VERSION; sub types @@ -431,6 +487,10 @@ $colref = $csv->getline ($io); # Read a line from file $io, # parse it and return an array # ref of fields + + $csv->column_names('Name','Address') # Set array of column names + $href = $csv->getline_hr($io) # Return line as hashref + $eof = $csv->eof (); # Indicate if last parse or # getline () hit End Of File @@ -862,6 +922,29 @@ The I<$csv-E<gt>string ()>, I<$csv-E<gt>fields ()> and I<$csv-E<gt>status ()> methods are meaningless, again. +=item getline_hr + +=item column_names + + $csv->column_names('Name','Address'); + # Or, with an arrayref + $csv->column_names(['Name','Address']); + + # Now, later you can do: + my $href = $csv->getline_hr($io); + +The C<getline_hr> and C<column_names> methods work together to allow you to +have rows returned as hashrefs. You must call C<column_names> first to declare +your column names. + + # Set the column names from a row in the CSV file + $csv->column_names( $csv->getline($io) ); + +Then you can call C<getline_hr()> as you would C<getline()>, and a hashref of +the row will be returned: + + my $href = $csv->getline_hr($io); + =item eof $eof = $csv->eof (); diff -rN -u old-Text-CSV_XS-0.37/MANIFEST new-Text-CSV_XS-0.37/MANIFEST --- old-Text-CSV_XS-0.37/MANIFEST 2008-04-03 13:13:30.000000000 -0400 +++ new-Text-CSV_XS-0.37/MANIFEST 2008-04-03 13:13:30.000000000 -0400 @@ -24,6 +24,7 @@ t/70_rt.t Tests based on RT reports t/80_diag.t Error diagnostics t/util.pl Extra test utilities +t/getline_hr.t Test column_names() and getline_hr() examples/csv2xls Script to onvert CSV files to M$Excel examples/speed.pl Small benchmark script META.yml Module meta-data (added by MakeMaker) diff -rN -u old-Text-CSV_XS-0.37/t/getline_hr.t new-Text-CSV_XS-0.37/t/getline_hr.t --- old-Text-CSV_XS-0.37/t/getline_hr.t 1969-12-31 19:00:00.000000000 -0500 +++ new-Text-CSV_XS-0.37/t/getline_hr.t 2008-04-03 13:13:30.000000000 -0400 @@ -0,0 +1,77 @@ +#!/usr/bin/perl + +use strict; +$^W = 1; + + +use Config; +use Test::More; + +BEGIN { + if (exists $Config{useperlio} && + defined $Config{useperlio} && + $Config{useperlio} eq "define") { + plan tests => 7 ; + } + else { + plan skip_all => "No perlIO available"; + } +} + +BEGIN { + require_ok "Text::CSV_XS"; + plan skip_all => "Cannot load Text::CSV_XS" if $@; +} + +# Setup + my $io; # filehandle + my $io_str; # storage + my $csv = Text::CSV_XS->new; + # Store a row in the $io handle, which uses $io_str for storage. + open $io, "+>", \$io_str or die "_test.csv: $!"; + $csv->print ($io, ['av','bv']); + # reset file handle to beginning. + seek($io,0,0); + +{ + my $test = "calling getline_hr before column_names dies with hint you've done something wrong"; + eval { my $hr = $csv->getline_hr($io); }; + like($@,qr/column_names/, $test); +} + + +{ + my $test = 'reality check column_names'; + $csv->column_names('c','d'); + my @cols = $csv->column_names; + is_deeply(['c','d'],\@cols,$test); +} + +{ + my $test = 'column_names() called a second time with arrayref should work, and clobber current values'; + $csv->column_names(['a','b']); + my @cols = $csv->column_names; + is_deeply(['a','b'],\@cols,$test); +} + +{ + my $hr = $csv->getline_hr($io); + is($hr->{a},'av', "getline_hr test for first key"); + is($hr->{b},'bv', "getline_hr test for last key"); +} + +{ + my $test = "mismatched number of column names and values returns an error"; + # reset file handle to beginning. + seek($io,0,0); + $csv->column_names('a'); + eval { my $hr = $csv->getline_hr($io); }; + like($@, qr/doesn't match/, $test); + +} + + + + + +
Subject: row-as-hashref feedback from peers
I posted a request for other feedback here, and Aristotle has already replied with some helpful feedback: http://use.perl.org/~markjugg/journal/36046 Mark
Subject: Re: [rt.cpan.org #34474] PATCH: getline_hr() and column_names()
Date: Thu, 3 Apr 2008 23:35:05 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 03 Apr 2008 13:21:02 -0400, "MARKSTOS via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> > Queue: Text-CSV_XS > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=34474 > > > Merijn,
Looks promising. I'm now in Norway, I'll dig deeper when I'm back. Thanks for the work! Show quoted text
> I needed this functionality and went ahead and wrote my own patch for > it, although I later found your GIT respository, where you had also > written some patch for this. ( > http://repo.or.cz/w/Text-CSV_XS.git?a=shortlog ) > > I've compared our work, and would like to point out some features of mine: > > - It uses "column_names", which I think is a clearer name. > > - It allows the names to be set from an array or an arrayref. The array > would be clearer if setting the names by hand, an the arrayref is is > nice because it is what getlines() returns.
Good. I'll include that Show quoted text
> - I include error checking for a number of cases including: > - invalid input to column_names() > - calling getlines_hr() before column_names() > - mismatching the number of colum names and values
Very good :) Show quoted text
> My patch also includes tests and documentation updates.
Very very good! :) Show quoted text
> The one thing in your patch I was curious about was the stub mention > in one of the XS files. My implementation doesn't use such a stub, > although it appears to function fine with one.
Can you expand on that? I have only one xs file. Show quoted text
> Since there are a lot of Text::CSV_XS users, I may get some feedback > from some others on the alternatives.
Yes, which is why I didn't release it yet. -- H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/) using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11, & 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org http://mirrors.develooper.com/hpux/ http://www.test-smoke.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #34474] PATCH: getline_hr() and column_names()
Date: Fri, 4 Apr 2008 11:20:52 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 03 Apr 2008 13:21:02 -0400, "MARKSTOS via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> > Queue: Text-CSV_XS > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=34474 > > > Merijn, > > I needed this functionality and went ahead and wrote my own patch for > it, although I later found your GIT respository, where you had also > written some patch for this. ( > http://repo.or.cz/w/Text-CSV_XS.git?a=shortlog )
Can you check that one again, and see it it meets your needs? Show quoted text
> I've compared our work, and would like to point out some features of mine: > > - It uses "column_names", which I think is a clearer name.
I read the use.perl discussion. I concede. Done. Show quoted text
> - It allows the names to be set from an array or an arrayref. The array > would be clearer if setting the names by hand, an the arrayref is is > nice because it is what getlines () returns.
Good point. Done. Show quoted text
> - I include error checking for a number of cases including:
I was still in a design phase. Error checking was (and always is) planned to be extended. Show quoted text
> - invalid input to column_names ()
Yes. Show quoted text
> - calling getlines_hr () before column_names ()
Yes. Show quoted text
> - mismatching the number of colum names and values
No. I really don't care. If a line half-way the CSV file has less (or more) fields than the header line, it is perfectly sane CSV. Probably not what was intended, but certainly not illegal. The way I assign to the hash makes the missing fields undef. Show quoted text
> My patch also includes tests and documentation updates. The one thing in
I'd like all test files to start with a number, so they are called in a somewhat logical order. Show quoted text
> your patch I was curious about was the stub mention in one of the XS
What do you mean? Show quoted text
> files. My implementation doesn't use such a stub, although it appears to > function fine with one. > > Since there are a lot of Text::CSV_XS users, I may get some feedback > from some others on the alternatives.
I value feedback a lot, and as you can see in the Changelog, RT, and on PerlMonks, I also act upon the feedback. -- H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/) using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11, & 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org http://mirrors.develooper.com/hpux/ http://www.test-smoke.org http://www.goldmark.org/jeff/stupid-disclaimers/
Subject: Re: [rt.cpan.org #34474] PATCH: getline_hr() and column_names()
Date: Fri, 04 Apr 2008 10:34:39 -0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: Mark Stosberg <mark [...] summersault.com>
Show quoted text
> Can you check that one again, and see it it meets your needs?
I've looked through it some more. It looks good! Some minor refinements: - A typo: proagated - Instead of: "getline_hr () called without column_names" I suggest "getline_hr () called before column_names" The former makes it seem like "column_names" might be an attribute which needs to be passed to getline_hr(). I think the latter makes it clearer that it is the ordering that matters. Show quoted text
>> your patch I was curious about was the stub mention in one of the XS
> > What do you mean?
I was mistaken. I must have seen this: + _hr_keys => undef, But that is nothing unusual. It it just a more formal declaration of an internal attribute than I used. Show quoted text
> I value feedback a lot, and as you can see in the Changelog, RT, and on > PerlMonks, I also act upon the feedback.
I think you have been a great maintainer, Merijn! I appreciate your work. A final suggestion: I suggest bumping the version to be at least .4 or .5 for the final text release, as I think hashref support will be considered a major feature for a number of people. Frankly, I find the module so solid, it could easily be considered at least "1.0" quality. Mark
Subject: Re: [rt.cpan.org #34474] PATCH: getline_hr() and column_names()
Date: Sat, 5 Apr 2008 10:53:09 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Fri, 04 Apr 2008 10:35:12 -0400, "mark@summersault.com via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> > Queue: Text-CSV_XS > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=34474 > > >
> > Can you check that one again, and see it it meets your needs?
> > I've looked through it some more. It looks good! > > Some minor refinements: > > - A typo: proagated
Fixed Show quoted text
> - Instead of: > "getline_hr () called without column_names" > I suggest > "getline_hr () called before column_names"
Done Show quoted text
> The former makes it seem like "column_names" might be an attribute which > needs to be passed to getline_hr(). I think the latter makes it clearer > that it is the ordering that matters. >
> >> your patch I was curious about was the stub mention in one of the XS
> > > > What do you mean?
> > I was mistaken. I must have seen this: > > + _hr_keys => undef,
That was what _COLUMN_NAMES is now. What's in a name. Show quoted text
> But that is nothing unusual. It it just a more formal declaration of an > internal attribute than I used. >
> > I value feedback a lot, and as you can see in the Changelog, RT, and on > > PerlMonks, I also act upon the feedback.
> > I think you have been a great maintainer, Merijn! I appreciate your work. > > A final suggestion: I suggest bumping the version to be at least .4 or > .5 for the final text release, as I think hashref support will be > considered a major feature for a number of people. Frankly, I find the > module so solid, it could easily be considered at least "1.0" quality.
I was thinking about 1.00 when I implemented the bind_columns () thing. That should warrant a big release. Another thing I now need is a XS binding to make to fit non-XS errors, like the two new croak's into the error_diag system -- H.Merijn Brand Amsterdam Perl Mongers (http://amsterdam.pm.org/) using & porting perl 5.6.2, 5.8.x, 5.10.x on HP-UX 10.20, 11.00, 11.11, & 11.23, SuSE 10.1 & 10.2, AIX 5.2, and Cygwin. http://qa.perl.org http://mirrors.develooper.com/hpux/ http://www.test-smoke.org http://www.goldmark.org/jeff/stupid-disclaimers/