Skip Menu |

This queue is for tickets about the Config-IniFiles CPAN distribution.

Report information
The Basics
Id: 59152
Status: resolved
Priority: 0/
Queue: Config-IniFiles

People
Owner: Nobody in particular
Requestors: meir [...] guttman.co.il
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



CC: shlomif [...] iglu.org.il
Subject: UTF-8 (and other Unicode encodings?) BOM cause the package to fail
Date: Wed, 07 Jul 2010 09:42:45 +0300
To: bug-Config-IniFiles [...] rt.cpan.org
From: Meir Guttman <meir [...] guttman.co.il>
Dear folks, The other day I discovered the hard way that the Config::IniFiles package fails to process UTF-8 Unicode encoded INI files when the file also includes a BOM (Byte Order Marker) signature. Attached are two INI files, one with a BOM, another is without. Other than this the two are identical. As anyone can see (in a Hex view), the 3-byte BOM at the very beginning of the BOM file is "EF BB BF". Also attached is a small Perl Script to demonstrate the result. (You have of course to edit it to switch between the BOM and the no-BOM versions.) The outcome of it when using the BOM INI file is: Line 1 in file utf8_bom.ini is mal-formed: ∩�[┐[General] 2: parameter found outside a section Please note the three "garbage" characters on my (Hebrew) cmd window. As for a correcting patch, I am afraid I am too much of a newbie to offer that. But may be all which is required is a "use encoding 'utf8';" statement? Regards, Meir
Download utf8_bom.ini
application/octet-stream 196b

Message body not shown because it is not plain text.

Message body is not shown because sender requested not to inline it.

Download utf8_no-bom.ini
application/octet-stream 193b

Message body not shown because it is not plain text.

On Wed Jul 07 02:43:03 2010, meir@guttman.co.il wrote: Show quoted text
> Dear folks, > > > > The other day I discovered the hard way that the Config::IniFiles package > fails to process UTF-8 Unicode encoded INI files when the file also
includes Show quoted text
> a BOM (Byte Order Marker) signature. > > > > Attached are two INI files, one with a BOM, another is without. Other than > this the two are identical. As anyone can see (in a Hex view), the 3-byte > BOM at the very beginning of the BOM file is "EF BB BF". > > > > Also attached is a small Perl Script to demonstrate the result. (You
have of Show quoted text
> course to edit it to switch between the BOM and the no-BOM versions.) The > outcome of it when using the BOM INI file is: > > > > Line 1 in file utf8_bom.ini is mal-formed: > > ∩�[┐[General] > > 2: parameter found outside a section > > > > Please note the three "garbage" characters on my (Hebrew) cmd window. > > > > As for a correcting patch, I am afraid I am too much of a newbie to offer > that. But may be all which is required is a "use encoding 'utf8';" > statement? >
After playing a little with your script, I found that this version works fine: {{{{{{{{{{{{{{{{ #!/usr/bin/perl use strict; use warnings; # use encoding "utf8"; # use open IO => ":encoding(utf8)"; use Config::IniFiles; my $cfg = Config::IniFiles->new(-file => "utf8_bom.ini") or do { my $err_message = join("\n", @Config::IniFiles::errors); die "$err_message\n"; }; my $cookie_jar = $cfg->val('General', 'cookie_jar'); print "Jar: $cookie_jar\n"; __END__ }}}}}}}}}}}}}}}} What do you need the "use open" call for? Regards, -- Shlomi Fish Show quoted text
> > > Regards, > > Meir >
Rejected due to lack of responsiveness from the reporter. If you wish to re-open, then comment.
On Fri Nov 19 09:02:18 2010, SHLOMIF wrote: Show quoted text
> Rejected due to lack of responsiveness from the reporter. If you wish to > re-open, then comment.
Reopening per the responsiveness of the reporter (Meir).
On Sat Jan 09 14:18:33 2016, SHLOMIF wrote: Show quoted text
> On Fri Nov 19 09:02:18 2010, SHLOMIF wrote:
> > Rejected due to lack of responsiveness from the reporter. If you wish to > > re-open, then comment.
> > Reopening per the responsiveness of the reporter (Meir).
Meir sent me a reproducing test case in private and I was able to fix it after referring to : < QUOTE > Thanks for the modified program - I was able to rework it into a usable, reproducing, condition. Now to the solution: searching http://duckduckgo.com/ for https://duckduckgo.com/?q=perl%20utf8%20bom yielded this Perl Monks thread - http://www.perlmonks.org/?node_id=599720 where https://metacpan.org/pod/File::BOM was recommended. After installing it and playing a little with it, I was able to create a Perl program that yields the same result with and without the BOM. I've attached it to this message in .7z format: ««« shlomif@telaviv1:~$ perl EoD-shlomif.pl m-with-bom.ini Node root: D:/Meir Log DIR root: WorkLOGs shlomif@telaviv1:~$ perl EoD-shlomif.pl m-without-bom.ini Node root: D:/Meir Log DIR root: WorkLOGs shlomif@telaviv1:~$ »»» Hope it helps. This problem is not specific to Config-IniFiles, but rather an issue with the way Perl 5 is implemented. And there's an easy solution on CPAN. Regards, Shlomi Fish < QUOTE > I've attached what I sent to Meir and people may refer to this answer here for more insights. resolving as NOT-A-BUG.
Subject: Meir-Config-IniFiles-BOM-Shlomif.7z
Download Meir-Config-IniFiles-BOM-Shlomif.7z
application/x-7z-compressed 987b

Message body not shown because it is not plain text.

resolving