Bug #132580 for Text-CSV_XS: How to ignore comment lines/blank lines?

Thu May 14 10:20:13 2020 SVW [...] cpan.org - Ticket created

Subject:

How to ignore comment lines/blank lines?

We have a pipe separated configuration file at our end that we want to manage with this module. In the end we want to manage it with DBD::CSV. Besides the real CSV style data lines comment lines and blank lines are present. Is it possible to configure this module in a way that these lines are ignored "while parsing" but still dumped as is at the original place "while writing"? At the moment blank lines are interpreted and written as || if the CSV style configuration file has for example 3 columns. For comment lines these 2 pipes would be appended to the end of each comment. Thx.

Fri May 15 02:54:50 2020 h.m.brand [...] xs4all.nl - Correspondence added

Subject:	Re: [rt.cpan.org #132580] How to ignore comment lines/blank lines?
Date:	Fri, 15 May 2020 08:53:20 +0200
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	"H.Merijn Brand" <h.m.brand [...] xs4all.nl>

On Thu, 14 May 2020 10:20:14 -0400, "Sven Willenbuecher via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text

> We have a pipe separated configuration file at our end that we want > to manage with this module. In the end we want to manage it with > DBD::CSV. Besides the real CSV style data lines comment lines and > blank lines are present. > > Is it possible to configure this module in a way that these lines are > ignored "while parsing" but still dumped as is at the original place > "while writing"? > > At the moment blank lines are interpreted and written as > > || > > if the CSV style configuration file has for example 3 columns. For > comment lines these 2 pipes would be appended to the end of each > comment.

As this is hardly "a bug", a better place to discuss this might be on github in an issue: https://github.com/Tux/Text-CSV_XS/issues With what you write, I seriously have no idea what your requirements are exactly. Could you add an anonymized example of what you have as data, how you want it written, and how you want it read back? -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.31 porting perl5 on HP-UX, AIX, and Linux https://useplaintext.email https://tux.nl http://www.test-smoke.org http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Download (untitled)
application/pgp-signature 488b

Message body not shown because it is not plain text.

Fri May 15 02:54:50 2020 The RT System itself - Status changed from 'new' to 'open'

Tue May 19 05:22:08 2020 SVW [...] cpan.org - Correspondence added

Am Fr 15. Mai 2020, 02:54:50, h.m.brand@xs4all.nl schrieb: Show quoted text

> On Thu, 14 May 2020 10:20:14 -0400, "Sven Willenbuecher via RT" > <bug-Text-CSV_XS@rt.cpan.org> wrote: >

> > We have a pipe separated configuration file at our end that we want > > to manage with this module. In the end we want to manage it with > > DBD::CSV. Besides the real CSV style data lines comment lines and > > blank lines are present. > > > > Is it possible to configure this module in a way that these lines are > > ignored "while parsing" but still dumped as is at the original place > > "while writing"? > > > > At the moment blank lines are interpreted and written as > > > > || > > > > if the CSV style configuration file has for example 3 columns. For > > comment lines these 2 pipes would be appended to the end of each > > comment.

> > As this is hardly "a bug", a better place to discuss this might be on > github in an issue: https://github.com/Tux/Text-CSV_XS/issues > > With what you write, I seriously have no idea what your requirements > are exactly. Could you add an anonymized example of what you have as > data, how you want it written, and how you want it read back? >

I thought changing the severity to "wishlist" makes "a bug" a feature request. Next time I know when to use CPAN RT and when to use GitHub issue tracker. Here is an example of a pipe separated CSV configuration file that we want to process (query, update) with DBD::CSV. <file> # --- description of the foo download scenario foo|TARGET|/partnerA foo|GENERICNAME|.*\.txt foo|FTP_MODE|binary # --- description of the bar download scenario bar|TARGET|/partnerB bar|GENERICNAME|.* bar|SEM_SUFFIX|.ok bar|DEL_INDICATOR|YES <file> Configuration sections are separated by zero or more empty lines and have multiple descriptive headers. Each section has three columns. When we query the file the empty lines and the comment lines should be ignored but writing the file should preserve these lines. At the moment the written file looks like this <file> ||^M # --- description of the foo download scenario||^M foo|TARGET|/partnerA^M foo|GENERICNAME|.*\.txt^M foo|FTP_MODE|binary^M ||^M # --- description of the bar download scenario||^M bar|TARGET|/partnerB^M bar|GENERICNAME|.*^M bar|SEM_SUFFIX|.ok^M bar|DEL_INDICATOR|YES^M ||^M <file>

Tue May 19 11:16:30 2020 h.m.brand [...] xs4all.nl - Correspondence added

Subject:	Re: [rt.cpan.org #132580] How to ignore comment lines/blank lines?
Date:	Tue, 19 May 2020 17:16:08 +0200
To:	bug-Text-CSV_XS [...] rt.cpan.org
From:	"H.Merijn Brand" <h.m.brand [...] xs4all.nl>

On Tue, 19 May 2020 05:22:09 -0400, "Sven Willenbuecher via RT" <bug-Text-CSV_XS@rt.cpan.org> wrote: Show quoted text

> I thought changing the severity to "wishlist" makes "a bug" a feature > request. Next time I know when to use CPAN RT and when to use GitHub > issue tracker.

You are probably right, but on GitHub, there is more room for others to chime in with ideas and feedback on feature requests. In the case you described, I am positive that the request is out of scope of the parser itself, as it will for sure cause an unacceptable slowdown of the normal parsing: the procedure you describe will need a lookahead. This will require additional buffering and processing, which will be a strain on processing streams and will also slowdown regular parsing. My suggestion would be to add the steering data in meta-files, certainly if you are using this with DBD::CSV, as that supports multiple locations by the use of f_dir_search (alongside f_dir). In this, meta-files are no more than other CSV files in the set of locations that can be queried to fetch what would have been in the now clean data-files. So, I think I will reject this request for the parser itself, as it will be better to move this to a higher level. You might consider subclassing DBD::CSV or make your own version of it. -- H.Merijn Brand http://tux.nl Perl Monger http://amsterdam.pm.org/ using perl5.00307 .. 5.31 porting perl5 on HP-UX, AIX, and Linux https://useplaintext.email https://tux.nl http://www.test-smoke.org http://qa.perl.org http://www.goldmark.org/jeff/stupid-disclaimers/

Download (untitled)
application/pgp-signature 488b

Message body not shown because it is not plain text.

Wed May 20 10:28:20 2020 SVW [...] cpan.org - Correspondence added

Am Di 19. Mai 2020, 11:16:30, h.m.brand@xs4all.nl schrieb: Show quoted text

> On Tue, 19 May 2020 05:22:09 -0400, "Sven Willenbuecher via RT" > <bug-Text-CSV_XS@rt.cpan.org> wrote: >

> > I thought changing the severity to "wishlist" makes "a bug" a feature > > request. Next time I know when to use CPAN RT and when to use GitHub > > issue tracker.

> > You are probably right, but on GitHub, there is more room for others to > chime in with ideas and feedback on feature requests. > > In the case you described, I am positive that the request is out of > scope of the parser itself, as it will for sure cause an unacceptable > slowdown of the normal parsing: the procedure you describe will need a > lookahead. This will require additional buffering and processing, which > will be a strain on processing streams and will also slowdown regular > parsing. > > My suggestion would be to add the steering data in meta-files, certainly > if you are using this with DBD::CSV, as that supports multiple locations > by the use of f_dir_search (alongside f_dir). In this, meta-files are no > more than other CSV files in the set of locations that can be queried to > fetch what would have been in the now clean data-files. > > So, I think I will reject this request for the parser itself, as it will > be better to move this to a higher level. You might consider subclassing > DBD::CSV or make your own version of it. >

Understood. Thanks for the feedback and suggestion.

Wed May 20 10:28:20 2020 SVW [...] cpan.org - Status changed from 'open' to 'resolved'