Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 14683
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: HMBRAND [...] cpan.org
Requestors: steve [...] purkis.ca
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: (no value)
Fixed in: 0.51



Subject: Text::CSV_XS should support UTF-8 strings in non-binary mode
Hiya, I realize Text::CSV_XS was written before Perl had any sane utf-8 framework, but I thought I'd point out its lack of easy (ie: non-binary) support is frustrating at times (other than this, I've found its a great tool!). The first issue is that you need to parse all utf-8 strings with a binary csv parser -- if you don't set the 'binary' flag, you get no output: perl use utf8; use Encode qw( decode_utf8 is_utf8 ); use Text::CSV_XS; my $in = "chaussée straße"; print "in: $in\tis utf8: ", is_utf8( $in ) ? 1 : 0, "\n"; my $csv = Text::CSV_XS->new({ sep_char => ',' }); $csv->combine( $in ); my $out = $csv->string; print "out: $out\tis utf8: ", is_utf8( $out ) ? 1 : 0, "\n"; __END__ in: chauss?e stra?e is utf8: 1 out: is utf8: 0 But when you create a new csv parser with the 'binary' flag set, you have to remember to decode the results as utf-8, which is annoying: perl use utf8; use Encode qw( decode_utf8 is_utf8 ); use Text::CSV_XS; my $in = "chaussée straße"; print "in: $in\tis utf8: ", is_utf8( $in ) ? 1 : 0, "\n"; my $csv = Text::CSV_XS->new({ binary => 1, sep_char => ',' }); $csv->combine( $in ); my $out = $csv->string; print "out: $out\tis utf8: ", is_utf8( $out ) ? 1 : 0, "\n"; my $utf8_out = decode_utf8( $csv->string, Encode::FB_PERLQQ ); print "utf8 out: $utf8_out\tis utf8: ", is_utf8( $utf8_out ) ? 1 : 0, "\n"; __END__ in: chauss?e stra?e is utf8: 1 out: "chaussée straße" is utf8: 0 utf8 out: "chauss?e stra?e" is utf8: 1 This is usable, but not ideal -- is there any chance we'll see some direct support for utf-8 strings? -Steve
Fixed. You still need Text::CSV_XS->new ({ binary => 1 }) though.