Skip Menu |

This queue is for tickets about the podlators CPAN distribution.

Report information
The Basics
Id: 102631
Status: open
Priority: 0/
Queue: podlators

People
Owner: Nobody in particular
Requestors: ether [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: wide char warnings
in perl 5.12.6, with App::Ack 2.14 installed and podlaters 3.18: : [ether@jaeger ~]$; perldoc ack Wide character in print at /Volumes/amaretto/Users/ether/perl5/perlbrew/perls/21.6/lib/5.21.6/Pod/Text.pm line 286. ~/.perlbrew/libs/21.6@std/bin/ack does not actually contain any literal wide characters - I scanned the file and it's all ascii (not even any latin1). However, it does contain lines like this: E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason, ...which I suspect are being converted to their unicode equivalents, and then being printed, without the encoding being set on the filehandle.
Subject: Re: [rt.cpan.org #102631] wide char warnings
Date: Sun, 08 Mar 2015 17:56:43 -0700
To: "Karen Etheridge via RT" <bug-podlators [...] rt.cpan.org>
From: Russ Allbery <rra [...] cpan.org>
"Karen Etheridge via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> in perl 5.12.6, with App::Ack 2.14 installed and podlaters 3.18:
Show quoted text
> : [ether@jaeger ~]$; perldoc ack > Wide character in print at /Volumes/amaretto/Users/ether/perl5/perlbrew/perls/21.6/lib/5.21.6/Pod/Text.pm line 286.
Show quoted text
> ~/.perlbrew/libs/21.6@std/bin/ack does not actually contain any literal > wide characters - I scanned the file and it's all ascii (not even any > latin1). However, it does contain lines like this:
Show quoted text
> E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason,
Show quoted text
> ...which I suspect are being converted to their unicode equivalents, and > then being printed, without the encoding being set on the filehandle.
Are you using perldoc with Pod::Text, and does this POD document have an =encoding command somewhere other than at the top of the document? If both of those are the case, this is a bug that's already fixed in Git. I just need to get a new release out. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
RT-Send-CC: RRA [...] cpan.org
On 2015-03-08 17:56:54, RRA wrote: Show quoted text
> Are you using perldoc with Pod::Text
From the error message, I'd assume so. I have no idea how/where this is configured. , and does this POD document have Show quoted text
> an > =encoding command somewhere other than at the top of the document? If > both of those are the case, this is a bug that's already fixed in Git. > I > just need to get a new release out.
I don't think there is any =encoding declaration at all in this file (the 'ack' executable, in the latest App::Ack release). Why would one be needed when there are no non-ascii characters in the document? There are wide unicode characters in escaped form, but that's not UTF-8 - that's unicode. The document may be being *output* in UTF-8 format, but that has nothing to do with the format of the input file itself. Once it's unicode characters in a memory buffer, the input format is irrelevant.
On 2015-03-08 18:42:56, ETHER wrote: Show quoted text
> On 2015-03-08 17:56:54, RRA wrote: >
> > Are you using perldoc with Pod::Text
> > From the error message, I'd assume so. I have no idea how/where this > is configured. > > , and does this POD document have
> > an > > =encoding command somewhere other than at the top of the document? > > If > > both of those are the case, this is a bug that's already fixed in > > Git. > > I > > just need to get a new release out.
> > I don't think there is any =encoding declaration at all in this file > (the 'ack' executable, in the latest App::Ack release). Why would one > be needed when there are no non-ascii characters in the document? > There are wide unicode characters in escaped form, but that's not UTF- > 8 - that's unicode. > > The document may be being *output* in UTF-8 format, but that has > nothing to do with the format of the input file itself. Once it's > unicode characters in a memory buffer, the input format is irrelevant.
Following up on some old tickets -- this problem still exists in perl 5.25.3 and the latest podlators (Pod::Text is 4.07) -- the error message is Wide character in print at /Volumes/amaretto/Users/ether/perl5/perlbrew/perls/25.3/lib/5.25.3/Pod/Text.pm line 287. There are no non-ascii characters in bin/ack (version 2.15_02): the relevant pod seems to be: =head1 ACKNOWLEDGEMENTS ... SE<eacute>bastien FeugE<egrave>re, RaE<uacute>l GundE<iacute>n, RaE<aacute>l GundE<aacute>n, GE<aacute>bor SzabE<oacute>, E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason, Ask BjE<oslash>rn Hansen, Slaven ReziE<0x107>, ...
Subject: Re: [rt.cpan.org #102631] wide char warnings
Date: Mon, 05 Sep 2016 20:37:23 -0700
To: "Karen Etheridge via RT" <bug-podlators [...] rt.cpan.org>
From: Russ Allbery <rra [...] cpan.org>
"Karen Etheridge via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> On 2015-03-08 18:42:56, ETHER wrote:
Show quoted text
>> I don't think there is any =encoding declaration at all in this file >> (the 'ack' executable, in the latest App::Ack release). Why would one >> be needed when there are no non-ascii characters in the document? >> There are wide unicode characters in escaped form, but that's not UTF- >> 8 - that's unicode.
Show quoted text
>> The document may be being *output* in UTF-8 format, but that has >> nothing to do with the format of the input file itself. Once it's >> unicode characters in a memory buffer, the input format is irrelevant.
Show quoted text
> Following up on some old tickets -- > this problem still exists in perl 5.25.3 and the latest podlators (Pod::Text is 4.07) -- the error message is
Show quoted text
> Wide character in print at /Volumes/amaretto/Users/ether/perl5/perlbrew/perls/25.3/lib/5.25.3/Pod/Text.pm line 287.
Show quoted text
> There are no non-ascii characters in bin/ack (version 2.15_02): the > relevant pod seems to be:
Show quoted text
> =head1 ACKNOWLEDGEMENTS
Show quoted text
> ... > SE<eacute>bastien FeugE<egrave>re, > RaE<uacute>l GundE<iacute>n, > RaE<aacute>l GundE<aacute>n, > GE<aacute>bor SzabE<oacute>, > E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason, > Ask BjE<oslash>rn Hansen, > Slaven ReziE<0x107>,
Ah, okay. I think I understand the problem, although I'm not sure what the best solution is. Thank you; that last line was exactly the context I needed. Pod::Text is defined to (and has always) use the same encoding for output as for its input. This isn't the greatest of choices, but it's a tricky compromise and I couldn't come up with a better approach. Obviously, specifying an encoding overrides that, but unless you do so, that's what it tries to do. The default encoding per perlpodspec is CP-1252. So by default, if you don't provide an =encoding or any other special options (pod2text -u, for instance), it tries to output CP-1252. But E<0x107> specifies a character that is not representable in CP-1252. (Obviously, this is a crappy error message for that problem, although I'm not entirely sure how to go about providing a better one.) If Pod::Text blindly generated UTF-8 when no other information was available (which is what the -u option to pod2text forces), that would make this problem go away on most systems, but there's a reason for the conservative choice of default character sets: it's been the default for a very long time, and not all systems default to UTF-8. Unfortunately, there's no good way to determine, on a Linux system, whether the output device is UTF-8 (plus Pod::Text output may be saved for some other purpose, moved to another system, etc.). The simplest, although somewhat unsatisfying, approach would be for this POD document to add =encoding UTF-8 (and optionally get rid of all the E<> formatting codes in favor of just writing the document in Unicode, although that wouldn't be required). Then Pod::Text would default to outputing Unicode, and the expected thing would happen, except on systems where Unicode output can't be handled (but those systems already can't display this document correctly because of the E<0x107> character). Another option is to just always use the utf8 option to Pod::Text or use pod2text -u, but that can be tricky to arrange in the build system. I could also coerce Pod::Text output to UTF-8 by default, instead of the current documented behavior: By default, Pod::Text uses the same output encoding as the input encoding of the POD source (provided that Perl was built with PerlIO; otherwise, it doesn't encode its output). But I'm a little nervous about changing such a long-standing default. -- #!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker $^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD, 00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{ rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print
On 2016-09-05 20:37:34, RRA wrote: Show quoted text
> > Pod::Text is defined to (and has always) use the same encoding for > output > as for its input. This isn't the greatest of choices, but it's a > tricky > compromise and I couldn't come up with a better approach. Obviously, > specifying an encoding overrides that, but unless you do so, that's > what > it tries to do.
Aha, thanks! I'll patch the file in question for now, as it looks like there are no quick and absolutely correct solutions on your end.
On 2016-09-06 12:01:00, ETHER wrote: Show quoted text
> On 2016-09-05 20:37:34, RRA wrote: >
> > > > Pod::Text is defined to (and has always) use the same encoding for > > output > > as for its input. This isn't the greatest of choices, but it's a > > tricky > > compromise and I couldn't come up with a better approach. Obviously, > > specifying an encoding overrides that, but unless you do so, that's > > what > > it tries to do.
> > Aha, thanks! I'll patch the file in question for now, as it looks > like there > are no quick and absolutely correct solutions on your end.
Patch submitted as https://github.com/petdance/ack2/pull/609.