"Karen Etheridge via RT" <bug-podlators@rt.cpan.org> writes:
Show quoted text> On 2015-03-08 18:42:56, ETHER wrote:
Show quoted text>> I don't think there is any =encoding declaration at all in this file
>> (the 'ack' executable, in the latest App::Ack release). Why would one
>> be needed when there are no non-ascii characters in the document?
>> There are wide unicode characters in escaped form, but that's not UTF-
>> 8 - that's unicode.
Show quoted text>> The document may be being *output* in UTF-8 format, but that has
>> nothing to do with the format of the input file itself. Once it's
>> unicode characters in a memory buffer, the input format is irrelevant.
Show quoted text> Following up on some old tickets --
> this problem still exists in perl 5.25.3 and the latest podlators (Pod::Text is 4.07) -- the error message is
Show quoted text> Wide character in print at /Volumes/amaretto/Users/ether/perl5/perlbrew/perls/25.3/lib/5.25.3/Pod/Text.pm line 287.
Show quoted text> There are no non-ascii characters in bin/ack (version 2.15_02): the
> relevant pod seems to be:
Show quoted text> =head1 ACKNOWLEDGEMENTS
Show quoted text> ...
> SE<eacute>bastien FeugE<egrave>re,
> RaE<uacute>l GundE<iacute>n,
> RaE<aacute>l GundE<aacute>n,
> GE<aacute>bor SzabE<oacute>,
> E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason,
> Ask BjE<oslash>rn Hansen,
> Slaven ReziE<0x107>,
Ah, okay. I think I understand the problem, although I'm not sure what
the best solution is. Thank you; that last line was exactly the context I
needed.
Pod::Text is defined to (and has always) use the same encoding for output
as for its input. This isn't the greatest of choices, but it's a tricky
compromise and I couldn't come up with a better approach. Obviously,
specifying an encoding overrides that, but unless you do so, that's what
it tries to do.
The default encoding per perlpodspec is CP-1252. So by default, if you
don't provide an =encoding or any other special options (pod2text -u, for
instance), it tries to output CP-1252. But E<0x107> specifies a character
that is not representable in CP-1252. (Obviously, this is a crappy error
message for that problem, although I'm not entirely sure how to go about
providing a better one.)
If Pod::Text blindly generated UTF-8 when no other information was
available (which is what the -u option to pod2text forces), that would
make this problem go away on most systems, but there's a reason for the
conservative choice of default character sets: it's been the default for
a very long time, and not all systems default to UTF-8. Unfortunately,
there's no good way to determine, on a Linux system, whether the output
device is UTF-8 (plus Pod::Text output may be saved for some other
purpose, moved to another system, etc.).
The simplest, although somewhat unsatisfying, approach would be for this
POD document to add =encoding UTF-8 (and optionally get rid of all the E<>
formatting codes in favor of just writing the document in Unicode,
although that wouldn't be required). Then Pod::Text would default to
outputing Unicode, and the expected thing would happen, except on systems
where Unicode output can't be handled (but those systems already can't
display this document correctly because of the E<0x107> character).
Another option is to just always use the utf8 option to Pod::Text or use
pod2text -u, but that can be tricky to arrange in the build system.
I could also coerce Pod::Text output to UTF-8 by default, instead of the
current documented behavior:
By default, Pod::Text uses the same output encoding as the input
encoding of the POD source (provided that Perl was built with PerlIO;
otherwise, it doesn't encode its output).
But I'm a little nervous about changing such a long-standing default.
--
#!/usr/bin/perl -- Russ Allbery, Just Another Perl Hacker
$^=q;@!>~|{>krw>yn{u<$$<[~||<Juukn{=,<S~|}<Jwx}qn{<Yn{u<Qjltn{ > 0gFzD gD,
00Fz, 0,,( 0hF 0g)F/=, 0> "L$/GEIFewe{,$/ 0C$~> "@=,m,|,(e 0.), 01,pnn,y{
rw} >;,$0=q,$,,($_=$^)=~y,$/ C-~><@=\n\r,-~$:-u/ #y,d,s,(\$.),$1,gee,print