Skip Menu |

This queue is for tickets about the podlators CPAN distribution.

Report information
The Basics
Id: 50333
Status: rejected
Priority: 0/
Queue: podlators

People
Owner: Nobody in particular
Requestors: jari.aalto [...] cante.net
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: pod2man does not handle POD E<> unicode sequences
OS : Debian testing (perl: 5.10.0-25) Version: This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi The pod2man does not translate the E<> sequences that are Unicode. There are also no warnings that the translation is not taking in effect. The POD: # See attached file for complete POD Unicode quotes E<0x2018>singleE<0x2019> and E<0x201C>doubleE<0x201D>. Example run: pod2man unicode.pod > /dev/null <no warnings> The [gn]roff output does not look correct: Unicode quotes XsingleX and XdoubleX. SUGGESTION At least substitute the unicode sequences with typical ASCII single(') and double (") equivalents until there is support for groff(1) unicode.
Subject: unicode.pod
=pod =encoding utf8 =head1 TEST Unicode quotes E<0x2018>singleE<0x2019> and E<0x201C>doubleE<0x201D>. =cut
The Unicode glyphs are explained in groff_char(7) manual, where the quotes mentioned are explained. So groff(1) does handle those. For compatibility it may be advieable to produce '\(' ascape sequences and not groff-only '\[]' See footnote 2 at page http://www.gnu.org/software/groff/manual/html_node/Using- Symbols.html#fn-2 Jari Aalto Quotes „ \[Bq] quotedblbase u201E low double comma quote ‚ \[bq] quotesinglbase u201A low single comma quote “ \[lq] quotedblleft u201C ” \[rq] quotedblright u201D ‘ \[oq] quoteleft u2018 single open quote ’ \[cq] quoteright u2019 single closing quote ' \[aq] quotesingle u0027 apostrophe quote (ASCII 39) " \[dq] quotedbl u0022 double quote (ASCII 34) « \[Fo] guillemotleft u00AB » \[Fc] guillemotright u00BB ‹ \[fo] guilsinglleft u2039 › \[fc] guilsinglright u203A
It appears that the groff manual mentions a sequence that can be used in translation of E<>: E<unicode value> => \N'<decimal value>' An example E<0x2018> => \N'8220' ---------------------------------- http://www.gnu.org/software/groff/manual/html_node/Using- Symbols.html#fn-2 — Escape: \N'n' Typeset the glyph with code n in the current font (n is not the input character code). The number n can be any non-negative decimal integer. Most devices only have glyphs with codes between 0 and 255; the Unicode output device uses codes in the range 0–65535. If the current font does not contain a glyph with that code, special fonts are not searched. The \N escape sequence can be conveniently used in conjunction with the char request:
Subject: Re: [rt.cpan.org #50333] pod2man does not handle POD E<> unicode sequences
Date: Thu, 08 Oct 2009 14:29:26 -0700
To: bug-podlators [...] rt.cpan.org
From: Russ Allbery <rra [...] stanford.edu>
"JARIAALTO via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text
> The pod2man does not translate the E<> sequences that are Unicode. There > are also no warnings that the translation is not taking in effect.
Show quoted text
> The POD:
Show quoted text
> # See attached file for complete POD > Unicode quotes E<0x2018>singleE<0x2019> and E<0x201C>doubleE<0x201D>.
Show quoted text
> Example run:
Show quoted text
> pod2man unicode.pod > /dev/null > <no warnings>
Show quoted text
> The [gn]roff output does not look correct:
Show quoted text
> Unicode quotes XsingleX and XdoubleX.
If you want pod2man to generate UTF-8 output, you need to pass the -u option to pod2man. See the man page for pod2man and its documentation of -u for all of the details of why this is the case. Since current versions of groff handle UTF-8 characters properly, there doesn't seem to be any point in adding an additional mode that would use groff-specific escapes. For compatibility with non-groff *roff implementations, -u is not the default. UTF-8 characters can even cause segfaults in some old vendor *roff implementations. The output of pod2man is intended to be portable and usable on systems other than the one on which it was generated. Show quoted text
> SUGGESTION
Show quoted text
> At least substitute the unicode sequences with typical ASCII single(') > and double (") equivalents until there is support for groff(1) unicode.
I don't immediately see the justification for treating those characters specially. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>
As discussed, I don't think it makes that much sense to special-case handling of specific Unicode characters, so I'm going to mark this bug rejected.  To produce Unicode output from pod2man, use pod2man -u.