Subject: | Text::CSV_XS should support UTF-8 strings in non-binary mode |
Hiya,
I realize Text::CSV_XS was written before Perl had any sane utf-8 framework, but I thought I'd point out its lack of easy (ie: non-binary) support is frustrating at times (other than this, I've found its a great tool!).
The first issue is that you need to parse all utf-8 strings with a binary csv parser -- if you don't set the 'binary' flag, you get no output:
perl
use utf8;
use Encode qw( decode_utf8 is_utf8 );
use Text::CSV_XS;
my $in = "chaussée straße";
print "in: $in\tis utf8: ", is_utf8( $in ) ? 1 : 0, "\n";
my $csv = Text::CSV_XS->new({ sep_char => ',' });
$csv->combine( $in );
my $out = $csv->string;
print "out: $out\tis utf8: ", is_utf8( $out ) ? 1 : 0, "\n";
__END__
in: chauss?e stra?e is utf8: 1
out: is utf8: 0
But when you create a new csv parser with the 'binary' flag set, you have to remember to decode the results as utf-8, which is annoying:
perl
use utf8;
use Encode qw( decode_utf8 is_utf8 );
use Text::CSV_XS;
my $in = "chaussée straße";
print "in: $in\tis utf8: ", is_utf8( $in ) ? 1 : 0, "\n";
my $csv = Text::CSV_XS->new({ binary => 1, sep_char => ',' });
$csv->combine( $in );
my $out = $csv->string;
print "out: $out\tis utf8: ", is_utf8( $out ) ? 1 : 0, "\n";
my $utf8_out = decode_utf8( $csv->string, Encode::FB_PERLQQ );
print "utf8 out: $utf8_out\tis utf8: ", is_utf8( $utf8_out ) ? 1 : 0, "\n";
__END__
in: chauss?e stra?e is utf8: 1
out: "chaussée straße" is utf8: 0
utf8 out: "chauss?e stra?e" is utf8: 1
This is usable, but not ideal -- is there any chance we'll see some direct support for utf-8 strings?
-Steve