Bug #50333 for podlators: pod2man does not handle POD E<> unicode sequences

Thu Oct 08 09:17:38 2009 JARIAALTO [...] cpan.org - Ticket created

Subject:

pod2man does not handle POD E<> unicode sequences

OS : Debian testing (perl: 5.10.0-25) Version: This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi The pod2man does not translate the E<> sequences that are Unicode. There are also no warnings that the translation is not taking in effect. The POD: # See attached file for complete POD Unicode quotes E<0x2018>singleE<0x2019> and E<0x201C>doubleE<0x201D>. Example run: pod2man unicode.pod > /dev/null <no warnings> The [gn]roff output does not look correct: Unicode quotes XsingleX and XdoubleX. SUGGESTION At least substitute the unicode sequences with typical ASCII single(') and double (") equivalents until there is support for groff(1) unicode.

Subject:

unicode.pod

=pod =encoding utf8 =head1 TEST Unicode quotes E<0x2018>singleE<0x2019> and E<0x201C>doubleE<0x201D>. =cut

Thu Oct 08 10:18:40 2009 JARIAALTO [...] cpan.org - Correspondence added

The Unicode glyphs are explained in groff_char(7) manual, where the quotes mentioned are explained. So groff(1) does handle those. For compatibility it may be advieable to produce '\(' ascape sequences and not groff-only '\[]' See footnote 2 at page http://www.gnu.org/software/groff/manual/html_node/Using- Symbols.html#fn-2 Jari Aalto Quotes „ \[Bq] quotedblbase u201E low double comma quote ‚ \[bq] quotesinglbase u201A low single comma quote “ \[lq] quotedblleft u201C ” \[rq] quotedblright u201D ‘ \[oq] quoteleft u2018 single open quote ’ \[cq] quoteright u2019 single closing quote ' \[aq] quotesingle u0027 apostrophe quote (ASCII 39) " \[dq] quotedbl u0022 double quote (ASCII 34) « \[Fo] guillemotleft u00AB » \[Fc] guillemotright u00BB ‹ \[fo] guilsinglleft u2039 › \[fc] guilsinglright u203A

Thu Oct 08 10:18:41 2009 The RT System itself - Status changed from 'new' to 'open'

Thu Oct 08 10:30:46 2009 JARIAALTO [...] cpan.org - Correspondence added

It appears that the groff manual mentions a sequence that can be used in translation of E<>: E<unicode value> => \N'<decimal value>' An example E<0x2018> => \N'8220' ---------------------------------- http://www.gnu.org/software/groff/manual/html_node/Using- Symbols.html#fn-2 — Escape: \N'n' Typeset the glyph with code n in the current font (n is not the input character code). The number n can be any non-negative decimal integer. Most devices only have glyphs with codes between 0 and 255; the Unicode output device uses codes in the range 0–65535. If the current font does not contain a glyph with that code, special fonts are not searched. The \N escape sequence can be conveniently used in conjunction with the char request:

Thu Oct 08 17:29:58 2009 rra [...] stanford.edu - Correspondence added

Subject:	Re: [rt.cpan.org #50333] pod2man does not handle POD E<> unicode sequences
Date:	Thu, 08 Oct 2009 14:29:26 -0700
To:	bug-podlators [...] rt.cpan.org
From:	Russ Allbery <rra [...] stanford.edu>

"JARIAALTO via RT" <bug-podlators@rt.cpan.org> writes: Show quoted text

> The pod2man does not translate the E<> sequences that are Unicode. There > are also no warnings that the translation is not taking in effect.

Show quoted text

> The POD:

Show quoted text

> # See attached file for complete POD > Unicode quotes E<0x2018>singleE<0x2019> and E<0x201C>doubleE<0x201D>.

Show quoted text

> Example run:

Show quoted text

> pod2man unicode.pod > /dev/null > <no warnings>

Show quoted text

> The [gn]roff output does not look correct:

Show quoted text

> Unicode quotes XsingleX and XdoubleX.

If you want pod2man to generate UTF-8 output, you need to pass the -u option to pod2man. See the man page for pod2man and its documentation of -u for all of the details of why this is the case. Since current versions of groff handle UTF-8 characters properly, there doesn't seem to be any point in adding an additional mode that would use groff-specific escapes. For compatibility with non-groff *roff implementations, -u is not the default. UTF-8 characters can even cause segfaults in some old vendor *roff implementations. The output of pod2man is intended to be portable and usable on systems other than the one on which it was generated. Show quoted text

> SUGGESTION

Show quoted text

> At least substitute the unicode sequences with typical ASCII single(') > and double (") equivalents until there is support for groff(1) unicode.

I don't immediately see the justification for treating those characters specially. -- Russ Allbery (rra@stanford.edu) <http://www.eyrie.org/~eagle/>

Tue Dec 29 00:21:42 2009 RRA [...] cpan.org - Correspondence added

As discussed, I don't think it makes that much sense to special-case handling of specific Unicode characters, so I'm going to mark this bug rejected. To produce Unicode output from pod2man, use pod2man -u.

Tue Dec 29 00:21:44 2009 RRA [...] cpan.org - Status changed from 'open' to 'rejected'