Skip Menu |

This queue is for tickets about the Dist-Zilla-Plugins-CJM CPAN distribution.

Report information
The Basics
Id: 94145
Status: resolved
Priority: 0/
Queue: Dist-Zilla-Plugins-CJM

People
Owner: Nobody in particular
Requestors: TINITA [...] cpan.org
Cc: ether [...] cpan.org
AdminCc:

Bug Information
Severity: (no value)
Broken in: 4.21
Fixed in: 4.22



I have a module encoded in latin1, and VersionFromModule complains about utf8. I have the latest Dist::Zilla and VersionFromModule. Here is the module: package Foo::Enctest; our $VERSION = 0.001; 1; =pod =encoding latin-1 =head1 SYNOPSIS äöü =cut The dist.ini: name = Foo-Enctest [@Basic] [VersionFromModule] [Encoding] filename = lib/Foo/Enctest.pm encoding = Latin-1 The error message: Could not decode UTF-8 lib/Foo/Enctest.pm; encoded_content set by @Basic/GatherDir (Dist::Zilla::Plugin::GatherDir line 165); error was: utf8 "\xE4" does not map to Unicode ... Dist::Zilla::Role::ModuleInfo::get_module_info('Dist::Zilla::Plugin::VersionFromModule=HASH(0x4cdb2b0)', 'Dist::Zilla::File::OnDisk=HASH(0x4a2da40)')
The VersionFromModule issue should be fixed with version 4.22 (just released to CPAN). But your example still won't work because of bugs in the Dist::Zilla core. It can't extract an abstract from a Latin-1 encoded module, and dies when it tries.
Subject: Re: [rt.cpan.org #94145] VersionFromModule complains about utf8 in latin1 encoded file
Date: Sat, 5 Apr 2014 11:47:55 -0700
To: "Christopher J. Madsen via RT" <bug-Dist-Zilla-Plugins-CJM [...] rt.cpan.org>
From: Karen Etheridge <ether [...] cpan.org>
On Sat, Apr 05, 2014 at 01:11:07PM -0400, Christopher J. Madsen via RT wrote: Show quoted text
> The VersionFromModule issue should be fixed with version 4.22 (just released to CPAN). But your example still won't work because of bugs in the Dist::Zilla core. It can't extract an abstract from a Latin-1 encoded module, and dies when it tries.
What bug in core is this?
Well, just try running the example in this bug with VersionFromModule 4.22. The bug is more of a design flaw. EncodingProviders aren't consulted until after the FileGatherer phase is complete. But some FileGatherers do things that cause other files to be inspected. This causes them to default to UTF-8, even if an EncodingProvider would have set a different encoding. In this case, Readme calls $dist->abstract, which causes dzil to inspect the main_module for an ABSTRACT comment. The abstract_from_file method passes $file->encoded_content to Pod::Eventual's read_string method, which treats it as UTF-8 encoded text. But it's actually Latin-1 encoded, so the parser croaks. So really, it's a bug that abstract_from_file assumes that the file must be UTF-8, and a design flaw that even if abstract_from_file were smarter, the file's encoding wouldn't have been set yet.
Subject: Re: [rt.cpan.org #94145] VersionFromModule complains about utf8 in latin1 encoded file
Date: Sat, 5 Apr 2014 23:27:54 -0700
To: "Christopher J. Madsen via RT" <bug-Dist-Zilla-Plugins-CJM [...] rt.cpan.org>
From: Karen Etheridge <ether [...] cpan.org>
On Sun, Apr 06, 2014 at 02:18:25AM -0400, Christopher J. Madsen via RT wrote: Show quoted text
> The bug is more of a design flaw. EncodingProviders aren't consulted until after the FileGatherer phase is complete. But some FileGatherers do things that cause other files to be inspected. This causes them to default to UTF-8, even if an EncodingProvider would have set a different encoding. > > In this case, Readme calls $dist->abstract, which causes dzil to inspect the main_module for an ABSTRACT comment.
Yes, and this is being called too soon. I addressed this in https://github.com/rjbs/Dist-Zilla/pull/288. ;) So, rjbs, can this be applied? Show quoted text
> So really, it's a bug that abstract_from_file assumes that the file must be UTF-8, and a design flaw that even if abstract_from_file were smarter, the file's encoding wouldn't have been set yet.
No, the bug is that abstract_from_file is being called before encodings are set. Technically, encodings should be set via a lazy builder at all, but should be required to be set by *something*, explicitly.
That pull request will stop Readme from calling $dist->abstract before the encodings are set, but it doesn't fix the problem that abstract_from_file ignores the file's encoding and just assumes UTF-8. I'm starting to think that MutableFile should have a builder for encoding that consults the EncodingProviders (instead of them being a phase that runs after FileGathers). Of course, this would require some way for the File object to know about the main $zilla object. (And require revising the EncodingProvider API.)