Subject: | Unexpected results parsing tab-separated spaces |
Perhaps this issue is out-of-scope for Text::CSV_XS, because I am using
it to parse tab-separated data, and I do not know whether that is supported.
When parsing tab-separated data, and the input record contains fields
that consist entirely of spaces (ascii 32), and allow_whitespace is
enabled, then Text::CSV_XS appears to remove not only the spaces but
also the adjacent tab characters, which has the effect of removing
actual fields from the data record.
Further, if the record ends with one or more spaces, then the parse
fails completely with the error "EIF - Binary character in unquoted
field, binary off".
I have attached a sample program that provides eight simple test cases,
four with allow_whitespace on, four with it off. I think it is
self-explanatory.
Thank you,
Eric J. Roode
Subject: | csv_test.pl |
#!perl
# Test case that demonstrates surprising Text::CSV_XS behavior.
use Text::CSV_XS;
my $csv = Text::CSV_XS->new({
sep_char => "\t",
allow_whitespace => 0,
});
my $rec1 = qq{One\t\tThree\t\t\tSix};
my $rec2 = qq{One\t \tThree\t \t \tSix};
my $rec3 = qq{One\t \tThree\t \t \t };
my $rec4 = qq{ \t \tThree\t \t \tSix};
print "We expect 6 fields each time.\n";
# ---------------- Adjacent tabs, no allow_whitespace ----------------
if ($csv->parse($rec1))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec1\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse rec1: $diag\n";
}
# ---------------- Space-infested tabs, no allow_whitespace ----------------
if ($csv->parse($rec2))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec2\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse rec2: $diag\n";
}
# ---------------- Spaces at the end, no allow whitespace ----------------
if ($csv->parse($rec3))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec3\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse rec3: $diag\n";
}
# ---------------- Spaces at the front, no allow whitespace ----------------
if ($csv->parse($rec4))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec4\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse recf: $diag\n";
}
# Now allow whitespace
$csv = Text::CSV_XS->new({
sep_char => "\t",
allow_whitespace => 1,
});
# ---------------- Adjacent tabs, allow_whitespace ----------------
if ($csv->parse($rec1))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec1 (allow_whitespace)\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse rec1: $diag\n";
}
# ---------------- Space-infested tabs, allow_whitespace ----------------
if ($csv->parse($rec2))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec2 (allow_whitespace)\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse rec2: $diag\n";
}
# ---------------- Spaces at the end, allow whitespace ----------------
if ($csv->parse($rec3))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec3 (allow_whitespace)\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse rec3: $diag\n";
}
# ---------------- Spaces at the front, allow whitespace ----------------
if ($csv->parse($rec4))
{
my @fields = $csv->fields;
print 'There are ', scalar(@fields), " fields in rec4\n";
}
else
{
my $diag = $csv->error_diag;
print "Couldn't parse recf: $diag\n";
}