Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 120655
Status: resolved
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: felix.ostmann [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 0.91
Fixed in: 1.28



Subject: bind_columns with strange behavior / length() from old value
This is perl 5, version 18, subversion 1 (v5.18.1) built for x86_64-linux ---- Hello, here is a small script, which produce a strange behavior using length(). The problem only kicks in when using: * bind_columns * a empty field * a unicode character in the previous row for the empty field ---- SKRIPT: ---- #!/usr/bin/env perl $|++; use strict; use File::Temp qw(tempfile); use Text::CSV_XS; my $temp_fh = IO::File->new_tmpfile; $temp_fh->print(<<CSV); field1,field2 pröblem,ignore ,ignore CSV $temp_fh->seek(0, 0); $temp_fh->binmode(':utf8'); my $order_csv = Text::CSV_XS->new(); my $row; { my $row_header = $order_csv->getline($temp_fh); $order_csv->bind_columns(\@{$row}{@$row_header}); } while ($order_csv->getline($temp_fh)) { printf( "STRING: >%s< ; LENGTH: %d ; HOTFIX LENGTH: %d\n", $row->{field1}, length($row->{field1}), length("".$row->{field1}), ); } ---- OUTPUT: ---- STRING: >pröblem< ; LENGTH: 7 ; HOTFIX LENGTH: 7 STRING: >< ; LENGTH: 7 ; HOTFIX LENGTH: 0
Subject: Re: [rt.cpan.org #120655] bind_columns with strange behavior / length() from old value
Date: Sun, 19 Mar 2017 19:30:07 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sun, 19 Mar 2017 09:55:26 -0400, "Felix Antonius Wilhelm Ostmann via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> here is a small script, which produce a strange behavior using > length (). The problem only kicks in when using: > * bind_columns > * a empty field > * a unicode character in the previous row for the empty field
Your problem most likely lies in the strongly discouraged use of ":utf8" See: --8<--- use strict; use warnings; use File::Temp qw(tempfile); use Text::CSV_XS; my $temp_fh = IO::File->new_tmpfile; $temp_fh->print (<<"CSV"); field1,field2 pröblem,ignore ,ignore CSV $temp_fh->seek (0, 0); $temp_fh->binmode (":utf8"); my $order_csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); my $row; { my $row_header = $order_csv->getline ($temp_fh); $order_csv->bind_columns (\@{$row}{@$row_header}); } while ($order_csv->getline ($temp_fh)) { printf "STRING: >%s< ; LENGTH: %d ; HOTFIX LENGTH: %d\n", $row->{field1}, length $row->{field1}, length "".$row->{field1}; } -->8--- => utf8 "\xF6" does not map to Unicode at xx.pl line 25, <_GEN_0> line 2. Wide character in printf at xx.pl line 29, <_GEN_0> line 2. STRING: >pr�blem< ; LENGTH: 4 ; HOTFIX LENGTH: 4 STRING: >< ; LENGTH: 4 ; HOTFIX LENGTH: 0 but with the correct use of encoding: $temp_fh->binmode (":encoding(utf-8)"); => utf8 "\xF6" does not map to Unicode at xx.pl line 22. STRING: >pr\xF6blem< ; LENGTH: 10 ; HOTFIX LENGTH: 10 STRING: >< ; LENGTH: 0 ; HOTFIX LENGTH: 0 I'd suggest you stop using ":utf8" per direct and start using the safe way with ":encoding(utf-8)". Anyway, this doesn't look like a CSV_XS problem -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.25 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.

From: felix.ostmann [...] gmail.com
Am So 19. Mär 2017, 14:31:42, h.m.brand@xs4all.nl schrieb: Show quoted text
> On Sun, 19 Mar 2017 09:55:26 -0400, "Felix Antonius Wilhelm Ostmann via > RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: >
> > here is a small script, which produce a strange behavior using > > length (). The problem only kicks in when using: > > * bind_columns > > * a empty field > > * a unicode character in the previous row for the empty field
> > Your problem most likely lies in the strongly discouraged use of ":utf8" > > See: > --8<--- > use strict; > use warnings; > > use File::Temp qw(tempfile); > use Text::CSV_XS; > > my $temp_fh = IO::File->new_tmpfile; > $temp_fh->print (<<"CSV"); > field1,field2 > pröblem,ignore > ,ignore > CSV > $temp_fh->seek (0, 0); > $temp_fh->binmode (":utf8"); > > my $order_csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); > > my $row; > { my $row_header = $order_csv->getline ($temp_fh); > $order_csv->bind_columns (\@{$row}{@$row_header}); > } > while ($order_csv->getline ($temp_fh)) { > printf "STRING: >%s< ; LENGTH: %d ; HOTFIX LENGTH: %d\n", > $row->{field1}, > length $row->{field1}, > length "".$row->{field1}; > } > -->8--- > > => > > utf8 "\xF6" does not map to Unicode at xx.pl line 25, <_GEN_0> line 2. > Wide character in printf at xx.pl line 29, <_GEN_0> line 2. > STRING: >pr�blem< ; LENGTH: 4 ; HOTFIX LENGTH: 4 > STRING: >< ; LENGTH: 4 ; HOTFIX LENGTH: 0 > > but with the correct use of encoding: > > $temp_fh->binmode (":encoding(utf-8)"); > > => > > utf8 "\xF6" does not map to Unicode at xx.pl line 22. > STRING: >pr\xF6blem< ; LENGTH: 10 ; HOTFIX LENGTH: 10 > STRING: >< ; LENGTH: 0 ; HOTFIX LENGTH: 0 > > I'd suggest you stop using ":utf8" per direct and start using the safe > way with ":encoding(utf-8)". > > Anyway, this doesn't look like a CSV_XS problem > >
Sorry, my script was ofcourse misleading without the charset information for the script. ---- I can use 'binary => 1' from Text::CSV_XS or 'binmode(":encoding(utf-8)")' from IO::File, both result in the error: If this is not a bug for Text::CSV_XS, please guide me to the correct point. ---- #!/usr/bin/env perl use Text::CSV_XS; my $temp_fh = IO::File->new_tmpfile; $temp_fh->print(<<CSV); field1 pr\x{C3}\x{96}blem CSV $temp_fh->seek(0, 0); my $csv = Text::CSV_XS->new({binary => 1}); my $row; { my $row_header = $csv->getline($temp_fh); $csv->bind_columns(\@{$row}{@$row_header}); } while ($csv->getline($temp_fh)) { printf( "STRING: >%s< ; LENGTH: %d ; HOTFIX LENGTH: %d\n", $row->{field1}, length($row->{field1}), length("".$row->{field1}), ); } ---- #!/usr/bin/env perl use Text::CSV_XS; my $temp_fh = IO::File->new_tmpfile; $temp_fh->print(<<CSV); field1 pr\x{C3}\x{96}blem CSV $temp_fh->seek(0, 0); $temp_fh->binmode('encoding(utf8)'); my $csv = Text::CSV_XS->new(); my $row; { my $row_header = $csv->getline($temp_fh); $csv->bind_columns(\@{$row}{@$row_header}); } while ($csv->getline($temp_fh)) { printf( "STRING: >%s< ; LENGTH: %d ; HOTFIX LENGTH: %d\n", $row->{field1}, length($row->{field1}), length("".$row->{field1}), ); }
Subject: Re: [rt.cpan.org #120655] bind_columns with strange behavior / length() from old value
Date: Mon, 20 Mar 2017 08:32:05 +0100
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Sun, 19 Mar 2017 18:32:24 -0400, "Felix Antonius Wilhelm Ostmann via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> Sorry, my script was ofcourse misleading without the charset > information for the script. > > ---- > > I can use 'binary => 1' from Text::CSV_XS or > 'binmode(":encoding(utf-8)")' from IO::File, both result in the error: > > If this is not a bug for Text::CSV_XS, please guide me to the correct > point.
So, It reduces to this reproducible case: --8<--- use 5.18.2; use warnings; use Text::CSV_XS; open my $fh, "<:encoding(utf-8)", \"c1\npr\x{c3}\x{b6}blem\n\n"; binmode STDOUT, ":encoding(utf-8)"; my $csv = Text::CSV_XS->new ({ binary => 1, auto_diag => 1 }); my %row; $csv->bind_columns (\@row{@{$csv->getline ($fh)}}); while ($csv->getline ($fh)) { printf "STRING: >%s< ; LENGTH: %d ; HOTFIX LENGTH: %d\n", $row{c1}, length $row{c1}, length "".$row{c1}; } -->8--- => STRING: >pröblem< ; LENGTH: 7 ; HOTFIX LENGTH: 7 STRING: >< ; LENGTH: 7 ; HOTFIX LENGTH: 0 I'll try to find the cause -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.25 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.

I've found a fix, but it looks like there might also be a bug in the core. I've asked the core people for comment. The fix is pushed, but I want to make new tests for this before I release. Thanks for pointing me to this mishap and taking the time to stay with me. Feel free to pull and test the fix
From: felix.ostmann [...] gmail.com
Am Mo 20. Mär 2017, 04:45:39, HMBRAND schrieb: Show quoted text
> I've found a fix, but it looks like there might also be a bug in the > core. I've asked the core people for comment. > > The fix is pushed, but I want to make new tests for this before I > release. > > Thanks for pointing me to this mishap and taking the time to stay with > me. > > Feel free to pull and test the fix
1.28 fixed the bug! Thanks for your help!
Show quoted text
> > Feel free to pull and test the fix
> > 1.28 fixed the bug! Thanks for your help!
Release will have to wait, as now tests fail with perl-5.6.1 Will have to dig for the cause