Skip Menu |

This queue is for tickets about the Text-CSV_XS CPAN distribution.

Report information
The Basics
Id: 104758
Status: rejected
Priority: 0/
Queue: Text-CSV_XS

People
Owner: Nobody in particular
Requestors: todd [...] xymmetrix.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: mis-handling of allow_loose_quotes?
Date: Thu, 28 May 2015 16:29:15 -0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "Todd Eigenschink" <todd [...] xymmetrix.com>
I ran into a situation with mid-field quotes in some data I was parsing. It seems like *almost* exactly the sort of thing that allow_loose_quotes was meant to handle. I can see how this might be a corner case because the the first quote character is actually the first character of the field, but it was still surprising. If you don't think it's a bug, so be it; I fixed it in my code by setting quote_char to an apostrophe (commented out in the example). See below. Thanks for all your work; I use Text::CSV_XS almost daily! ---------------------------------------------------------------------- #!/usr/bin/perl use Text::CSV_XS; use Data::Dumper; my $csv = Text::CSV_XS->new({sep_char => '|', allow_loose_quotes => 1, #quote_char => "'", auto_diag => 1}); my $string = <<EOF Field One|"d" Parcela #14 Barcelona|Field Three EOF ; $csv->parse($string); my @f = $csv->fields(); print Dumper(\@f); ---------------------------------------------------------------------- Todd -- Todd Eigenschink Xymmetrix, LLC todd@xymmetrix.com http://www.xymmetrix.com/ Non ex transverso sed deorsum 260-407-1584
Subject: Re: [rt.cpan.org #104758] mis-handling of allow_loose_quotes?
Date: Thu, 28 May 2015 23:16:58 +0200
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "H.Merijn Brand" <h.m.brand [...] xs4all.nl>
On Thu, 28 May 2015 16:29:46 -0400, "todd@xymmetrix.com via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text
> my $csv = Text::CSV_XS->new({sep_char => '|', > allow_loose_quotes => 1, > #quote_char => "'", > auto_diag => 1}); > > my $string = <<EOF > Field One|"d" Parcela #14 Barcelona|Field Three > EOF > ;
The "problem" here is that the first " is a valid quote that cannot be recognized as a loose-quote. By definition, this is the opening quote of the field (when " is the quotation character). The next quote is either the escape or a quote. In this example, the second " could be skipped on allow_loose_quotes, *but* that won't help, as then it will run into the problem that the quoted field is not terminated and thus parsing will fail. $ perl -MText::CSV_XS=csv -e'csv(in=>\q{Field One|"d" Parcela #14 Barcelona|Field Three },sep=>"|",diag_verbose=>9)' # CSV_XS ERROR: 2023 - EIQ - QUO character not allowed @ rec 0 pos 13 field 2 Field One|"d" Parcela #14 Barcelona|Field Three ' ^ Is the easiest way to find errors. I think I agree that $ perl -MText::CSV_XS=csv -e'csv(in=>\q{Field One|"d" Parcela #14 Barcelona|Field Three },sep=>"|",allow_loose_quotes=>1,diag_verbose=>9)' # CSV_XS ERROR: 2023 - EIQ - QUO character not allowed @ rec 0 pos 13 field 2 Field One|"d" Parcela #14 Barcelona|Field Three ' ^ could be diagnosed better. I'll have a look later. What you really want in *this* case is escape_char => undef, quote_char => undef, which will end up in being just the same as using split /\|/ Does this help? -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.21 porting perl5 on HP-UX, AIX, and openSUSE http://mirrors.develooper.com/hpux/ http://www.test-smoke.org/ http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/
Download (untitled)
application/pgp-signature 490b

Message body not shown because it is not plain text.

Subject: Re: [rt.cpan.org #104758] mis-handling of allow_loose_quotes?
Date: Fri, 29 May 2015 15:09:09 -0400
To: bug-Text-CSV_XS [...] rt.cpan.org
From: "Todd Eigenschink" <todd [...] xymmetrix.com>
h.m.brand@xs4all.nl via RT writes: Show quoted text
>The "problem" here is that the first " is a valid quote that cannot be >recognized as a loose-quote. By definition, this is the opening quote >of the field (when " is the quotation character). The next quote is >either the escape or a quote.
Yeah, I get that. I just don't happen to *like* it. :-) I chased my tail on it for a bit before reporting, because a couple fields earlier in the real data there was a loose-quoted letter in the middle of the field. I wasn't paying attention to the position, so I was convinced that loose quoting wasn't working at all. Show quoted text
>What you really want in *this* case is > > escape_char => undef, > quote_char => undef,
That's basically what I wound up doing. Anyway, whether you change something or don't, thanks for maintaining and improving a terrific package. Todd -- Todd Eigenschink Xymmetrix, LLC todd@xymmetrix.com http://www.xymmetrix.com/ Non ex transverso sed deorsum 260-407-1584
I see no way out of this other than the status-quo(te) allow_loose_quotes is only active in unquoted fields. e.g 1, "d" 2, 3 is valid with allow_loose_quotes, but 1,"d" 2,3 can never be made valid that way. That is why I reject this ticket