Subject: | nextstep encoding is broken - missing ASCII characters |
Date: | Sat, 24 Jul 2010 02:47:46 +0200 |
To: | bug-Encode [...] rt.cpan.org |
From: | Tom Hageman <tom [...] trh.demon.nl> |
Hi Dan,
When I tried to use the nextstep encoding (in order to convert some
old NeXT-era .rtf files to Mac OS X) it failed with a lot of error
messages like:
$./recode.pl ~/@basil/Mailboxes/Dev/EnhanceMail.mbox/NEXTTOYOU-
Enhancemen.attach/index.rtf test-recode-NEXTTOYOU-Enhancemen.rtf
nextstep "\x7B" does not map to Unicode at ./recode.pl line 8.
nextstep "\x5C" does not map to Unicode at ./recode.pl line 8.
nextstep "\x72" does not map to Unicode at ./recode.pl line 8.
nextstep "\x74" does not map to Unicode at ./recode.pl line 8.
nextstep "\x66" does not map to Unicode at ./recode.pl line 8.
nextstep "\x30" does not map to Unicode at ./recode.pl line 8.
[...]
This suggests that the nextstep encoding does not define a mapping
for the regular ASCII characters (range \x20 - \x7f).
A quick look at http://cpansearch.perl.org/src/DANKOGAI/Encode-2.39/
ucm/nextstep.ucm ($Id: nextstep.ucm,v 2.0 2004/05/16 20:55:28
dankogai Exp $) seems to confirm that hunch.
Tested with Perl 5.8.6 (shipped with Mac OS X 10.4.11 PPC) Encode-
Show quoted text
>VERSION 2.08, Encode::Byte->VERSION 2.00, but apparently still an
issue in the most recent version, see above.
test script (straightforward adaptation from example in the POD):
=== recode.pl ===
#! /usr/bin/perl
# via PerlIO
my $infile = $ARGV[0];
my $outfile = $ARGV[1];
open my $in, "<:encoding(nextstep)", $infile or die;
open my $out, ">:encoding(MacRoman)", $outfile or die;
while(<$in>) { print $out $_; }
===
=== input ===
{\rtf0\ansi{\fonttbl\f0\fswiss Helvetica;}
\margl120
\margr120
{\colortbl;\red204\green0\blue17;\red0\green0\blue0;\red88\green88
\blue88;\red85\green19\blue134;}
\pard\tx533\tx1067\tx1601\tx2135\tx2668\tx3202\tx3736\tx4270\tx4803
\tx5337\f0\b0\i0\ulnone\fs24\fc0\cf0 Hi Tom,\
[...]
===
=== output ===
\x7B\x5C\x72\x74\x66\x30\x5C\x61\x6E\x73\x69\x7B\x5C\x66\x6F\x6E\x74
\x74\x62\x6C\x5C\x66\x30\x5C\x66\x73\x77\x69\x73\x73\x20\x48\x65\x6C
\x76\x65\x74\x69\x63\x61\x3B\x7D
\x5C\x6D\x61\x72\x67\x6C\x31\x32\x30
\x5C\x6D\x61\x72\x67\x72\x31\x32\x30
\x7B\x5C\x63\x6F\x6C\x6F\x72\x74\x62\x6C\x3B\x5C\x72\x65\x64\x32\x30
\x34\x5C\x67\x72\x65\x65\x6E\x30\x5C\x62\x6C\x75\x65\x31\x37\x3B\x5C
\x72\x65\x64\x30\x5C\x67\x72\x65\x65\x6E\x30\x5C\x62\x6C\x75\x65\x30
\x3B\x5C\x72\x65\x64\x38\x38\x5C\x67\x72\x65\x65\x6E\x38\x38\x5C\x62
\x6C\x75\x65\x38\x38\x3B\x5C\x72\x65\x64\x38\x35\x5C\x67\x72\x65\x65
\x6E\x31\x39\x5C\x62\x6C\x75\x65\x31\x33\x34\x3B\x7D
\x5C\x70\x61\x72\x64\x5C\x74\x78\x35\x33\x33\x5C\x74\x78\x31\x30\x36
\x37\x5C\x74\x78\x31\x36\x30\x31\x5C\x74\x78\x32\x31\x33\x35\x5C\x74
\x78\x32\x36\x36\x38\x5C\x74\x78\x33\x32\x30\x32\x5C\x74\x78\x33\x37
\x33\x36\x5C\x74\x78\x34\x32\x37\x30\x5C\x74\x78\x34\x38\x30\x33\x5C
\x74\x78\x35\x33\x33\x37\x5C\x66\x30\x5C\x62\x30\x5C\x69\x30\x5C\x75
\x6C\x6E\x6F\x6E\x65\x5C\x66\x73\x32\x34\x5C\x66\x63\x30\x5C\x63\x66
\x30\x20\x48\x69\x20\x54\x6F\x6D\x2C\x5C
[...]
===
Best regards,
Tom Hageman.