Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Pod-Simple CPAN distribution.

Report information
The Basics
Id: 79180
Status: rejected
Priority: 0/
Queue: Pod-Simple

People
Owner: Nobody in particular
Requestors: ANDK [...] cpan.org
Cc: DAMI [...] cpan.org
AdminCc:

Bug Information
Severity: Normal
Broken in: 3.23
Fixed in: (no value)



CC: DAMI [...] cpan.org
Subject: New warning about missing =encoding probably not justified
https://rt.cpan.org/Ticket/Display.html?id=79079 Here Laurent Dami cites from perlpod manpage and to me this reads like a clear unambiguous spec. I wish I would have read this earlier. I case you did not notice, I wrote tickets against the following distros, all due to the new warning: Mail-Transport-Dbx Games-Pentago Lingua-HU-Numbers Math-Symbolic Test-Regression Finnigan Finance-Currency-Convert-XE DateTime-Event-Easter Locale-Maketext-Lexicon-DBI Net-Radius Catalyst-View-PDF-Reuse Lingua-StarDict-Gen Math-Polynomial-Solve SVG-Sparkline LaTeX-Decode Shell-Perl WWW-3Taps-API Music-Tag Text-Capitalize IO-Util Config-Model-OpenSsh XML-Compare Geo-Postcodes Lingua-LO-Romanize CGI-Auth-Basic Data-UUID-Base64URLSafe Lingua-PT-Words2Nums Regexp-Grammars PGXN-Site Device-TLSPrinter DBIx-Connect-FromConfig Perl6-Perldoc Regexp-Log-Common Lingua-PT-ProperNames Pod-POM-Web There will probably be many more users affected by the warning. Not really justified, I believe.
CC: Grant McLean <grant [...] catalyst.net.nz>, Ricardo Signes <rjbs [...] cpan.org>
Subject: Re: [rt.cpan.org #79180] New warning about missing =encoding probably not justified
Date: Thu, 23 Aug 2012 21:44:18 -0700
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
On Aug 23, 2012, at 9:18 PM, Andreas Koenig via RT wrote: Show quoted text
> Here Laurent Dami cites from perlpod manpage and to me this reads like a > clear unambiguous spec. I wish I would have read this earlier.
FYI, the relevant bit for Grant and Rik, copied from https://rt.cpan.org/Ticket/Display.html?id=79079: Show quoted text
> (perlpod : "but if your encoding isn't US- > ASCII or Latin-1, then put a =encoding encodingname command early in > the document so that pod formatters will know how to decode the > document. ")
Overall though, I find the warning useful. I am not sure if there is a way to reliable detect that the encoding is Latin-1. Maybe complain only if it is not Latin-1? I think that might be a bit too permissive. Or we can just suck it up, let people complain for a few more months, and then just live with it from here on in, after everybody has already fixed their Pod. Thoughts? David
CC: bug-Pod-Simple [...] rt.cpan.org, Ricardo Signes <rjbs [...] cpan.org>
Subject: Re: [rt.cpan.org #79180] New warning about missing =encoding probably not justified
Date: Fri, 24 Aug 2012 18:22:58 +1200
To: "David E. Wheeler" <dwheeler [...] cpan.org>
From: Grant McLean <grant [...] catalyst.net.nz>
On Thu, 2012-08-23 at 21:44 -0700, David E. Wheeler wrote: Show quoted text
> On Aug 23, 2012, at 9:18 PM, Andreas Koenig via RT wrote:
> > Here Laurent Dami cites from perlpod manpage and to me this reads
like a Show quoted text
> > clear unambiguous spec. I wish I would have read this earlier.
> > FYI, the relevant bit for Grant and Rik, copied from
https://rt.cpan.org/Ticket/Display.html?id=79079: Show quoted text
>
> > (perlpod : "but if your encoding isn't US-ASCII or Latin-1, then put > > a =encoding encodingname command early in the document so that pod > > formatters will know how to decode the document. ")
The perlpodspec also has this to say: "=encoding encodingname" This command, which should occur early in the document (at least before any non-US-ASCII data!). Which quietly omits any reference to Latin-1. But using non-ASCII data in any case without declaring an encoding seems unwise. Show quoted text
> Overall though, I find the warning useful. I am not sure if there is a
way Show quoted text
> to reliable detect that the encoding is Latin-1. Maybe complain only
if it is Show quoted text
> not Latin-1? I think that might be a bit too permissive.
No it's not really possible. The heuristic which is applied on encountering non-ASCII bytes will choose UTF-8 only if it sees a valid UTF-8 byte sequence (which is extremely unlikely to occur in actual Latin-1 text) and defaults to Latin-1 otherwise. The old behaviour of Pod::Simple was to return Perl character strings for any POD source which declared an encoding (even if that encoding was Latin-1) and raw bytes if no encoding was declared. This inconsistent behaviour is what caused POD rendering issues on metacpan.org which in turn triggered my involvement. The new behaviour is to consistently return Perl character strings unless encoding processing is disabled with the new parse_characters option. Show quoted text
> Or we can just suck it up, let people complain for a few more months,
and Show quoted text
> then just live with it from here on in, after everybody has already
fixed Show quoted text
> their Pod.
Yeah I vote for this :-) It sounds like you (David) have copped a bit of flack over this and I certainly never meant for that to happen. If I'd been able to foresee all the issues that eventually arose I would certainly have wanted some more discussion on the POD mailing list to get buy-in. I'd hope however that the new consistent behaviour would still have been the result. So sorry for the pain and where possible please blame me. Regards Grant
Le Ven 24 Aoû 2012 02:23:16, grant@catalyst.net.nz a écrit : Show quoted text
> The perlpodspec also has this to say: > > "=encoding encodingname" > > This command, which should occur early in the document (at least > before any > non-US-ASCII data!). > > Which quietly omits any reference to Latin-1. But using non-ASCII
data Show quoted text
> in any case without declaring an encoding seems unwise. >
OK, but speaking of perlpodspec, there is also this bit, where Latin-1 is mentioned as a kind of implicit default : "Since Perl recognizes a Unicode Byte Order Mark at the start of files as signaling that the file is Unicode encoded as in UTF-16 (whether big- endian or little-endian) or UTF-8, Pod parsers should do the same. Otherwise, the character encoding should be understood as being UTF-8 if the first highbit byte sequence in the file seems valid as a UTF-8 sequence, or otherwise as Latin-1."
RT-Send-CC: rjbs [...] cpan.org, grant [...] catalyst.net.nz, dwheeler [...] cpan.org
Sorry, I messed up with the "Cc" fields in RT, so here is my comment again. Le Ven 24 Aoû 2012 03:28:51, DAMI a écrit : Show quoted text
> Le Ven 24 Aoû 2012 02:23:16, grant@catalyst.net.nz a écrit :
> > The perlpodspec also has this to say: > > > > "=encoding encodingname" > > > > This command, which should occur early in the document (at least > > before any > > non-US-ASCII data!). > > > > Which quietly omits any reference to Latin-1. But using non-ASCII
> data
> > in any case without declaring an encoding seems unwise. > >
> > OK, but speaking of perlpodspec, there is also this bit, where Latin-
1 Show quoted text
> is mentioned as a kind of implicit default : > > "Since Perl recognizes a Unicode Byte Order Mark at the start of
files Show quoted text
> as signaling that the file is Unicode encoded as in UTF-16 (whether
big- Show quoted text
> endian or little-endian) or UTF-8, Pod parsers should do the same. > Otherwise, the character encoding should be understood as being UTF-8 > if the first highbit byte sequence in the file seems valid as a UTF-8 > sequence, or otherwise as Latin-1." >
CC: rjbs [...] cpan.org, dwheeler [...] cpan.org
Subject: Re: [rt.cpan.org #79180] New warning about missing =encoding probably not justified
Date: Sat, 25 Aug 2012 14:43:56 +1200
To: bug-Pod-Simple [...] rt.cpan.org
From: Grant McLean <grant [...] catalyst.net.nz>
On Fri, 2012-08-24 at 03:53 -0400, Laurent Dami via RT wrote: Show quoted text
> > OK, but speaking of perlpodspec, there is also this bit, where > > Latin-1 is mentioned as a kind of implicit default : > > > > "Since Perl recognizes a Unicode Byte Order Mark at the start of > > files as signaling that the file is Unicode encoded as in UTF-16 > > (whether big-endian or little-endian) or UTF-8, Pod parsers should > > do the same. Otherwise, the character encoding should be understood > > as being UTF-8 if the first highbit byte sequence in the file seems > > valid as a UTF-8 sequence, or otherwise as Latin-1."
Indeed, that is the heuristic that Pod::Simple was missing which is now implemented. But those are simply the rules that the parser should use to guess the encoding if none is specified in the POD source. I agree that in the absence of evidence of UTF-* then Latin-1 is a good fallback (although experience has shown that CP1252 would have been a better choice than ISO8859-1). I'm not sure what is to be gained by elevating the last guess in the absence of other evidence to "implicit default". The warning seems useful for people who want their POD to be interpreted in an unambiguous way and those who don't care for warnings can turn them off. Test::POD does elevate all warning to fatal test failures and I did offer to write a patch to make Test::Pod downgrade this particular message to a warning only. The overwhelming response on the POD mailing list was that this was not necessary and that people should simply declare encodings. Regards Grant
(1) this is not about Test::Pod; Test::Pod is involved, yes, but it does not deserve attention in this ticket because it was not changed, it is working as it always has been (2) this is not about changing the specs; my impression is the specs are good and they are implemented correctly as far as I can tell; cudos to the implementors! (3) CPAN has already 35 distros broken and every day I discover new ones; few of them are really broken, they just are broken due to a newly invented warning that has no justification; those that are really broken do no harm anyway; they can be analysed offline and get their bugreport from those who care enough (4) the new warning is of the type lint; people who want to do it 105% right are happy to add an =encoding line even if the specs do not mandate it. In such a context it is useful. It should be only visible in a context chosen specifically to get some *extra* warnings (5) the new warning violates RFC 1958, Architectural Principles of the Internet: 3.9 Be strict when sending and tolerant when receiving (6) I do not think that 'just sucking it up' is a polite way of causing people worthless extra work (7) one answer to "those who don't care for warnings can turn them off". Yes, "they" can. No, "they" can't. Depends on who "they" are. In any case it is extra work and makes the CPAN a less comfortable place to work with (99) educative link, a noop release: http://search.cpan.org/diff?from=Regexp-Grammars-1.020&to=Regexp-Grammars-1.021
Subject: Re: [rt.cpan.org #79180] New warning about missing =encoding probably not justified
Date: Mon, 27 Aug 2012 09:56:30 -0700
To: bug-Pod-Simple [...] rt.cpan.org
From: "David E. Wheeler" <dwheeler [...] cpan.org>
On Aug 25, 2012, at 1:41 AM, Andreas Koenig via RT wrote: Show quoted text
> (4) the new warning is of the type lint; people who want to do it 105% > right are happy to add an =encoding line even if the specs do not > mandate it. In such a context it is useful. It should be only visible > in a context chosen specifically to get some *extra* warnings
There is no such context in Pod::Simple. I am loathe to introduce one. Folks can set `no_whining` if they don't want warnings. Show quoted text
> (5) the new warning violates RFC 1958, Architectural Principles of the > Internet: 3.9 Be strict when sending and tolerant when receiving
It's a warning, not an error. Show quoted text
> (6) I do not think that 'just sucking it up' is a polite way of > causing people worthless extra work
It is not worthless, IMO. Encodings are more important every day. I, for one, have been grateful for the reports you sent me for modules that included non-ASCII characters I had not intended to be there. Show quoted text
> (7) one answer to "those who don't care for warnings can turn them > off". Yes, "they" can. No, "they" can't. Depends on who "they" are. In > any case it is extra work and makes the CPAN a less comfortable place > to work with
Yeah, but who is this hurting? Show quoted text
I personally did not release a new version of my module that fixed this one warning. And I got no complaint from the Regexp::Grammar maintainers about this, either. Look, new versions of Perl introduce new warnings, too. I had to go through all my code and replace instances of `for qw(...) {}` with `for (qw(...)) {}` when 5.14 came out. Those were no-op fixes, but I was glad to have been made aware of the issue so I could fix it. So, I have seen some complaints about the warning from Test::Pod users, but those have dwindled to nothing. I have seen no complaints about Pod::Simple (other than legitimate bugs) until this weekend (RT#79232), and it has been months. I, personally, am not convinced that this warning was a mistake. I am going to close this ticket, as RT#79232 now dupes it, and that one is more specific about the issue to be addressed. Best, David