On Thu, 28 Sep 2017 05:31:42 -0400, "FANY via RT"
<bug-Text-CSV_XS@rt.cpan.org> wrote:
Show quoted text> The documentation states "The default behavior is to detect if the
> header line starts with a BOM."
Thanks for the report. All feedback is valued.
The documentation you quote is part of the $csv->header method, not of
the headers *attribute* of the csv function or method
=item detect_bom
$csv->header ($fh, { detect_bom => 1 });
The default behavior is to detect if the header line starts with a BOM. If
the header has a BOM, use that to set the encoding of C<$fh>. This default
behavior can be disabled by passing a false value to C<detect_bom>.
See also the docs that explain this:
=head3 detect_bom
X<detect_bom>
If C<detect_bom> is given, the method L</header> will be invoked on the
opened stream to check if there is a BOM and set the encoding accordingly.
C<detect_bom> can be abbreviated to C<bom>.
This is the same as setting L<C<encoding>|/encoding> to C<"auto">.
Note that as L</header> is invoked, its default is to also set the headers.
Show quoted text> This, however, does not seem to work:
>
> $ file test.csv
> test.csv: UTF-8 Unicode (with BOM) text
$ file rt123147.csv
rt123147.csv: UTF-8 Unicode (with BOM) text
Show quoted text> $ cat test.csv
> foo,bar
> 23,42
> 47,11
$ cat rt123147.csv
foo,bar
23,42
47,11
Show quoted text> $ xxd test.csv
> 0000000: efbb bf66 6f6f 2c62 6172 0a32 332c 3432 ...foo,bar.23,42
> 0000010: 0a34 372c 3131 0a .47,11.
$ xxd rt123147.csv
00000000: efbb bf66 6f6f 2c62 6172 0a32 332c 3432 ...foo,bar.23,42
00000010: 0a34 372c 3131 0a .47,11.
Show quoted text> $ cat test.plx
> #!/usr/bin/env perl
>
> use 5.02;
> use warnings;
>
> use Text::CSV_XS qw(csv);
>
> my $aoh = csv in => \*STDIN, headers => 'auto', @ARGV;
>
> use Data::Dump qw(ddx);
> ddx $aoh;
$ cat rt123147.pl
#!/pro/bin/perl
use 5.18.2;
use warnings;
use CSV;
DDumper csv (in => "rt123147.csv", headers => "auto");
DDumper csv (in => "rt123147.csv", encoding => "auto");
DDumper csv (in => "rt123147.csv", bom => 1);
Show quoted text> $ ./test.plx <test.csv
> # test.plx:11: [
> # { "bar" => 42, "\x{FEFF}foo" => 23 },
> # { "bar" => 11, "\x{FEFF}foo" => 47 },
> # ]
$ perl rt123147.pl
[
{ bar => '42',
"\x{feff}foo" => '23'
},
{ bar => '11',
"\x{feff}foo" => '47'
}
]
[
{ bar => '42',
foo => '23'
},
{ bar => '11',
foo => '47'
}
]
[
{ bar => '42',
foo => '23'
},
{ bar => '11',
foo => '47'
}
]
Show quoted text> So the BOM is not removed.
The second and third call imply the invocation of the header *method*,
as the docs clearly state.
Do you agree with the fact that this is not a bug?
You might want to change this to be a feature request, but I am unsure
if this will break other people's scripts and/or expectations
Show quoted text> When I explicitly turn detect_bom on, the first record gets lost,
> and the fields of the second record are taken as headers:
That is because you now define the headers TWICE, one time by using
the headers attribute, and once implicitly by specifying an option to
use the headers method. I somehow agree that this might be a bug, as a
warning is in place here. You should not do that.
Show quoted text> $ ./test.plx <test.csv detect_bom 1
> # test.plx:11: [{ 23 => 47, 42 => 11 }]
> $
>
> I use the current version 1.32 of Text::CSV and could reproduce
> this bug running Perl 5.26.0 on openSUSE Linux 42.3 as well as
> with Perl 5.26.0 on Mac OS X 10.12.6.
>
> fany
--
H.Merijn Brand
http://tux.nl Perl Monger
http://amsterdam.pm.org/
using perl5.00307 .. 5.27 porting perl5 on HP-UX, AIX, and openSUSE
http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/
http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/