On Wed, 21 Jan 2009 17:56:03 -0500, "MSISK via RT"
<bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text> See the attached example file. For a quote character of 0xfe and a
> separator character of 0x14 (with utf8 flag on for these strings),
> parsing fails with:
>
> ECR - Characters after end of quoted field
>
> Without the utf8 flag set on the quote/sep strings, parsing fails to
> separate the input into fields but produces no error.
>
> This works with Text::CSV_PP.
Text::CSV_XS does NOT support quote and separator characters in Unicode
as per documented specifications
$ cat 42642.pl
use strict;
use warnings;
use Data::Peek;
use Text::CSV_XS;
my $csv = Text::CSV_XS->new ({
binary => 1,
quote_char => "\xfe",
sep_char => "\x14",
});
open my $fh, "<", "42642.csv" or die "42462.csv: $!\n";
while (my $row = $csv->getline ($fh)) {
print "Row $.\n";
print " ", DPeek ($_), "\n" for @$row;
}
$csv->eof or $csv->error_diag;
$ perl 42642.pl
Row 1
PV("DOG"\0)
PV("CAT"\0)
PV("WOMBAT"\0)
PV("BANDERSNATCH"\0)
Row 2
PV("0"\0)
PV("1"\0)
PV("2"\0)
PV("3"\0)
From the docs:
--8<---
Though this is the most clear and restrictive definition, Text::CSV_XS
is way more liberal than this, and allows extension:
· Line termination by a single carriage return is accepted by default
· The separation-, escape-, and escape- characters can be any ASCII
character in the range from 0x20 (space) to 0x7E (tilde). Characters
outside this range may or may not work as expected. Multibyte charac-
ters, like U+060c (ARABIC COMMA), U+FF0C (FULLWIDTH COMMA), U+241B
(SYMBOL FOR ESCAPE), U+2424 (SYMBOL FOR NEWLINE), U+FF02 (FULLWIDTH
QUOTATION MARK), and U+201C (LEFT DOUBLE QUOTATION MARK) (to give
some examples of what might look promising) are therefor not allowed.
-->8---
The solution to you problem is to decode your sep_char and quote_char
To demonstrate that that works (maybe not through the most elegant
solution:
$ cat 42642.pl
use strict;
use warnings;
use Data::Peek;
use Text::CSV_XS;
my $quo = substr ("\xfe\x{20ac}", 0, 1);
my $sep = substr ("\x14\x{20ac}", 0, 1);
print "quote: ", DPeek ($quo), "\n";
print "sep: ", DPeek ($sep), "\n";
utf8::decode ($quo);
utf8::decode ($sep);
my $csv = Text::CSV_XS->new ({
binary => 1,
quote_char => $quo,
sep_char => $sep,
});
print "quote: ", DPeek ($csv->quote_char), "\n";
print "sep: ", DPeek ($csv->sep_char), "\n";
open my $fh, "<", "42642.csv" or die "42462.csv: $!\n";
while (my $row = $csv->getline ($fh)) {
print "Row $.\n";
print " ", DPeek ($_), "\n" for @$row;
}
$csv->eof or $csv->error_diag;
$ perl 42642.pl
quote: PV("\303\276"\0) [UTF8 "\x{fe}"]
sep: PV("\24"\0) [UTF8 "\x{14}"]
quote: PV("\376"\0)
sep: PV("\24"\0)
Row 1
PV("DOG"\0)
PV("CAT"\0)
PV("WOMBAT"\0)
PV("BANDERSNATCH"\0)
Row 2
PV("0"\0)
PV("1"\0)
PV("2"\0)
PV("3"\0)
$
--
H.Merijn Brand
http://tux.nl Perl Monger
http://amsterdam.pm.org/
using & porting perl 5.6.2, 5.8.x, 5.10.x, 5.11.x on HP-UX 10.20, 11.00,
11.11, 11.23, and 11.31, SuSE 10.1, 10.3, and 11.0, AIX 5.2, and Cygwin.
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/