Skip Menu |

This queue is for tickets about the Module-Metadata CPAN distribution.

Maintainer(s)' notes

Attention bug reporters: issues MUST include the version of Module::Metadata that you are running that exhibit the stated symptoms. thank you!

Report information
The Basics
Id: 78434
Status: open
Priority: 0/
Queue: Module-Metadata

People
Owner: Nobody in particular
Requestors: BBYRD [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: Rare BOM will mess up package detection
So, because of a certain set of situations: 1. Notepad++'s "UTF-8" defaults to putting a BOM in front of the file. 2. My package line is at the very first line. 3. I use OurVersion, so the version doesn't have the package name built-in. M:M ended up not auto-detecting the package name. So, it looks like the RE just needs to detect a 0xEFBBBF at beginning of the line, or look for it when the first line is read and strip it out.
Thanks for your report. From perlunicode, perl is supposed to recognize UTF8, UTF16-LE and UTF16-BE BOMs at the beginning of a Perl source file, so I think Module::Metadata should decode the source file appropriately when it sees the BOM. Thoughts? Vincent
On Sun Jul 29 18:51:44 2012, VPIT wrote: Show quoted text
> Thanks for your report. > > From perlunicode, perl is supposed to recognize UTF8, UTF16-LE and > UTF16-BE BOMs at the beginning of a Perl source file, so I think > Module::Metadata should decode the source file appropriately when it > sees the BOM.
Nope. I talked with some of the guys on IRC about it, including doy, and there's an important distinction: Perl will decode a source file that it's actually reading/parsing, but reading a file that happens to be Perl source is a different matter. In the latter case, Perl will merely follow what binmode is doing. In the case of Module::Metadata, I would say to detect the BOM at the beginning, and if it exists, remove it. Not even Encode::Guess seems to remove BOMs if they appear in UTF-8 code.
Subject: Re: [rt.cpan.org #78434] Rare BOM will mess up package detection
Date: Tue, 31 Jul 2012 19:48:43 +0100
To: bug-Module-Metadata [...] rt.cpan.org
From: David Leadbeater <dgl [...] dgl.cx>
On 31 July 2012 19:26, Brendan Byrd via RT <bug-Module-Metadata@rt.cpan.org>wrote: Show quoted text
> In the case of Module::Metadata, I would say to detect the BOM at the > beginning, and if it exists, remove it. Not even Encode::Guess seems to > remove BOMs if they appear in UTF-8 code. >
I think to be correct it would have to decode UTF-16, note how this does actually work: echo "print 'hello world'" | iconv -t utf16 | perl - However just stripping the BOM would solve the reported issue. (Aside: I don't handle UTF-16 in cpangrep, maybe I should so it's possible to determine if anyone actually is insane enough to use UTF-16 to encode Perl source).
On Mar 31 Jui 2012 14:26:48, BBYRD wrote : Show quoted text
> On Sun Jul 29 18:51:44 2012, VPIT wrote:
> > Thanks for your report. > > > > From perlunicode, perl is supposed to recognize UTF8, UTF16-LE and > > UTF16-BE BOMs at the beginning of a Perl source file, so I think > > Module::Metadata should decode the source file appropriately when it > > sees the BOM.
> > Nope. I talked with some of the guys on IRC about it, including doy, > and there's an important distinction: Perl will decode a source file > that it's actually reading/parsing, but reading a file that happens to > be Perl source is a different matter. In the latter case, Perl will > merely follow what binmode is doing.
Except that Module::Metadata is also supposed to be able to extract POD, and handing back octet POD strings to the user is not really useful. For that reason, I think that Module::Metadata should also honour "use utf8" and "=encoding", but that's another matter. Show quoted text
> In the case of Module::Metadata, I would say to detect the BOM at the > beginning, and if it exists, remove it. Not even Encode::Guess seems to > remove BOMs if they appear in UTF-8 code.
Starting from version 1.000011, Module::Metadata->new_from_file and ->new_from_module look for a UTF-8/UTF-16LE/UTF-16BE BOM at the beginning of the file, skip it, then decode appropriately the rest of the input. Module::Metadata->new_from_handle is untouched. The decoding part is easily removable if deemed harmful.
Subject: Re: [rt.cpan.org #78434] Rare BOM will mess up package detection
Date: Tue, 21 Aug 2012 14:43:55 -0400
To: bug-Module-Metadata [...] rt.cpan.org
From: David Golden <dagolden [...] cpan.org>
On Tue, Aug 21, 2012 at 2:02 PM, Vincent Pit via RT < bug-Module-Metadata@rt.cpan.org> wrote: Show quoted text
> Except that Module::Metadata is also supposed to be able to extract POD, > and handing back octet POD strings to the user is not really useful. For > that reason, I think that Module::Metadata should also honour "use utf8" > and "=encoding", but that's another matter. >
+1 for =encoding. I'm not sure about "use utf8". What does 'perldoc' do? -- David
Issue open on github for figuring out what to do here: https://github.com/Perl-Toolchain-Gang/Module-Metadata/issues/2