On Wed, 7 Aug 2013 17:26:20 -0400, "Lady Aleena via RT"
<bug-DBD-CSV@rt.cpan.org> wrote:
Show quoted text> Wed Aug 07 17:26:20 2013: Request 87686 was acted upon.
> Transaction: Ticket created by ALEENA
> Queue: DBD-CSV
> Subject: No HOW TO section in documentation
> Broken in: (no value)
> Severity: Normal
> Owner: Nobody
> Requestors: ALEENA@cpan.org
> Status: new
> Ticket <URL:
https://rt.cpan.org/Ticket/Display.html?id=87686 >
>
>
> The documentation, as written now, makes the assumption the reader is savvy
> leaving the unsavvy readers behind. A HOW TO section would help bridge
> the gap.
Not really. Allow me to disagree on a new section. I admit that some of
the CSV handling implies a rather steep learning curve. DBD::CSV is
there to take away that curve and offer a DBI interface that adds a SQL
interface to CSV files.
One of the issues with CSV files is that their format is simple in the
definition causing many producers to create CSV that does not directly
comply to the basic rules. e.g. just adding "'s to all fields and pass
all fields joined by ',' is what many producers interpret as valid CSV
say join "," => map { qq{"$_"} } @fields;
is bound to break every possible rule about CSV, as fields might
contain "'s, ,'s or newlines. Using a correct CSV producer will easy
the life of a CSV parser a lot. As many still don't, the two major CSV
parsers on CPAN (Text::CSV_XS and Text::CSV, which follows
Text::CSV_XS) will have to allow options (attributes) to feature
workarounds for bad producers, so bad records like
1,"ok","not"ok",2,"not,ok"
can be parsed as the end user expects
Show quoted text> I have a .csv file with the headings on the first line of the file.
Having (correct) headers is an advantage. When the header is a single
line, DBD::CSV will automatically pick that up as column names.
Show quoted text> Also several fields within the file are spread across multiple lines.
If you here mean that fields in the data may contain newlines (please
do not allow that in the header), DBD::CSV will know how to deal with
that as Text::CSV_XS will know how to deal with that by default.
Show quoted text> Under HOW TO, it would be nice to see:
>
> use strict;
> use warnings;
>
> use DBD::CSV;
> use Data::Dumper;
>
> ... # what goes here?
>
> print Dumper($array_of_hashes_ref) # or $hash_of_hashes_ref
You don't want that from DBD::CSV, though it *is* possible, but a lot
slower than when using Text::CSV_XS directly. Assuming you want to
parse/read di.csv in the current folder:
# using DBD::CSV
use DBI;
use Data::Peek;
my $dbh = DBI->connect ("dbi:CSV:", undef, undef, {
f_ext => ".csv/r",
csv_null => 1,
RaiseError => 1,
PrintError => 1,
FetchHashKeyName => "NAME_lc",
}) or die $DBI::errstr;
my $sth = $dbh->prepare ("select * from di");
$sth->execute;
my $aoh;
while (my $ref = $sth->fetchrow_hashref) {
push @$aoh, $ref;
}
DDumper ($aoh);
# using Text::CSV_XS
use Text::CSV_XS;
use Data::Peek;
my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 });
open my $fh, "<", "di.csv" or die "di.csv: $!";
$csv->column_names ($csv->getline ($fh));
my $aoh = $csv->getline_hr_all ($fh);
DDumper ($aoh);
These should result in the same $aoh (content-wise), but the latter is
factors faster:
Rate dbi csvxs
dbi 141/s -- -98%
csvxs 6250/s 4347% --
--
H.Merijn Brand
http://tux.nl Perl Monger
http://amsterdam.pm.org/
using perl5.00307 .. 5.19 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/