Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 123147
Status: rejected
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: FANY [...] cpan.org
Cc: PR-62 [...] jira.noris.de
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: PR-62 [...] jira.noris.de
Subject: strange behaviour regarding BOM detection
The documentation states "The default behavior is to detect if the header line starts with a BOM." This, however, does not seem to work: $ file test.csv test.csv: UTF-8 Unicode (with BOM) text $ cat test.csv foo,bar 23,42 47,11 $ xxd test.csv 0000000: efbb bf66 6f6f 2c62 6172 0a32 332c 3432 ...foo,bar.23,42 0000010: 0a34 372c 3131 0a .47,11. $ cat test.plx #!/usr/bin/env perl use 5.02; use warnings; use Text::CSV_XS qw(csv); my $aoh = csv in => \*STDIN, headers => 'auto', @ARGV; use Data::Dump qw(ddx); ddx $aoh; $ ./test.plx <test.csv # test.plx:11: [ # { "bar" => 42, "\x{FEFF}foo" => 23 }, # { "bar" => 11, "\x{FEFF}foo" => 47 }, # ] So the BOM is not removed. When I explicitly turn detect_bom on, the first record gets lost, and the fields of the second record are taken as headers: $ ./test.plx <test.csv detect_bom 1 # test.plx:11: [{ 23 => 47, 42 => 11 }] $ I use the current version 1.32 of Text::CSV and could reproduce this bug running Perl 5.26.0 on openSUSE Linux 42.3 as well as with Perl 5.26.0 on Mac OS X 10.12.6. Regards fany
Subject: test.csv
foo,bar 23,42 47,11
Subject: Re: [rt.cpan.org #123147] strange behaviour regarding BOM detection
Date: Thu, 28 Sep 2017 12:09:35 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 28 Sep 2017 05:31:42 -0400, "FANY via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> The documentation states "The default behavior is to detect if the > header line starts with a BOM."
Thanks for the report. All feedback is valued. The documentation you quote is part of the $csv->header method, not of the headers *attribute* of the csv function or method =item detect_bom $csv->header ($fh, { detect_bom => 1 }); The default behavior is to detect if the header line starts with a BOM. If the header has a BOM, use that to set the encoding of C<$fh>. This default behavior can be disabled by passing a false value to C<detect_bom>. See also the docs that explain this: =head3 detect_bom X<detect_bom> If C<detect_bom> is given, the method L</header> will be invoked on the opened stream to check if there is a BOM and set the encoding accordingly. C<detect_bom> can be abbreviated to C<bom>. This is the same as setting L<C<encoding>|/encoding> to C<"auto">. Note that as L</header> is invoked, its default is to also set the headers. Show quoted text
> This, however, does not seem to work: > > $ file test.csv > test.csv: UTF-8 Unicode (with BOM) text
$ file rt123147.csv rt123147.csv: UTF-8 Unicode (with BOM) text Show quoted text
> $ cat test.csv > foo,bar > 23,42 > 47,11
$ cat rt123147.csv foo,bar 23,42 47,11 Show quoted text
> $ xxd test.csv > 0000000: efbb bf66 6f6f 2c62 6172 0a32 332c 3432 ...foo,bar.23,42 > 0000010: 0a34 372c 3131 0a .47,11.
$ xxd rt123147.csv 00000000: efbb bf66 6f6f 2c62 6172 0a32 332c 3432 ...foo,bar.23,42 00000010: 0a34 372c 3131 0a .47,11. Show quoted text
> $ cat test.plx > #!/usr/bin/env perl > > use 5.02; > use warnings; > > use Text::CSV_XS qw(csv); > > my $aoh = csv in => \*STDIN, headers => 'auto', @ARGV; > > use Data::Dump qw(ddx); > ddx $aoh;
$ cat rt123147.pl #!/pro/bin/perl use 5.18.2; use warnings; use CSV; DDumper csv (in => "rt123147.csv", headers => "auto"); DDumper csv (in => "rt123147.csv", encoding => "auto"); DDumper csv (in => "rt123147.csv", bom => 1); Show quoted text
> $ ./test.plx <test.csv > # test.plx:11: [ > # { "bar" => 42, "\x{FEFF}foo" => 23 }, > # { "bar" => 11, "\x{FEFF}foo" => 47 }, > # ]
$ perl rt123147.pl [ { bar => '42', "\x{feff}foo" => '23' }, { bar => '11', "\x{feff}foo" => '47' } ] [ { bar => '42', foo => '23' }, { bar => '11', foo => '47' } ] [ { bar => '42', foo => '23' }, { bar => '11', foo => '47' } ] Show quoted text
> So the BOM is not removed.
The second and third call imply the invocation of the header *method*, as the docs clearly state. Do you agree with the fact that this is not a bug? You might want to change this to be a feature request, but I am unsure if this will break other people's scripts and/or expectations Show quoted text
> When I explicitly turn detect_bom on, the first record gets lost, > and the fields of the second record are taken as headers:
That is because you now define the headers TWICE, one time by using the headers attribute, and once implicitly by specifying an option to use the headers method. I somehow agree that this might be a bug, as a warning is in place here. You should not do that. Show quoted text
> $ ./test.plx <test.csv detect_bom 1 > # test.plx:11: [{ 23 => 47, 42 => 11 }] > $ > > I use the current version 1.32 of Text::CSV and could reproduce > this bug running Perl 5.26.0 on openSUSE Linux 42.3 as well as > with Perl 5.26.0 on Mac OS X 10.12.6. > > fany
-- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 473b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #123147] strange behaviour regarding BOM detection
Date: Fri, 29 Sep 2017 14:59:06 +0200
To: "h.m.brand [...] xs4all.nl via RT" <bug-Text-CSV_XS [...] rt.cpan.org>
From: "Martin H. Sluka" <martin [...] sluka.de>
Hi H.Merijn, thank you very much for your quick and helpful reply! Obviously I had not read the documentation carefully enough. Sorry for that! Kind regards fany