Skip Menu |

This queue is for tickets about the perlindex CPAN distribution.

Report information
The Basics
Id: 16372
Status: rejected
Priority: 0/
Queue: perlindex

People
Owner: ULPFR [...] cpan.org
Requestors: jpierce [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.500
Fixed in: (no value)



Subject: UTF-8 brokeness
On linux under 5.8.0 for several core pods including perldsc and perlclib I get a smattering of: Malformed UTF-8 character (unexpected continuation byte 0xbd, with no preceding start byte) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (unexpected continuation byte 0xbe, with no preceding start byte) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (unexpected continuation byte 0xbf, with no preceding start byte) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (unexpected non-continuation byte 0xc1, immediately after start byte 0xc0) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (unexpected non-continuation byte 0xc2, immediately after start byte 0xc1) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (unexpected non-continuation byte 0xc3, immediately after start byte 0xc2) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (byte 0xfe) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (byte 0xff) in transliteration (tr///) at /usr/bin/perlindex line 232, <IN> line 958. Malformed UTF-8 character (overflow at 0xc41461c8, byte 0xc9, after start byte 0xbf) in transliteration (tr///) at /usr/bin/perlindex line 233, <IN> line 958. Malformed UTF-8 character (overflow at 0xfb280832, byte 0x37, after start byte 0xff) in transliteration (tr///) at /usr/bin/perlindex line 233, <IN> line 958.
perlindex was written about 10 year ago. It does not support UTF-8. Patches are welcome.
On 2005-12-08 20:44:14 星期四, JPIERCE wrote: Show quoted text
> On linux under 5.8.0 for several core pods including perldsc and > perlclib > > I get a smattering of: > > Malformed UTF-8 character (unexpected continuation byte 0xbd, with no > preceding start byte) in transliteration (tr///)
at /usr/bin/perlindex Show quoted text
> line 232, <IN> line 958. > Malformed UTF-8 character (unexpected continuation byte 0xbe, with no > preceding start byte) in transliteration (tr///)
at /usr/bin/perlindex Show quoted text
> line 232, <IN> line 958. > Malformed UTF-8 character (unexpected continuation byte 0xbf, with no > preceding start byte) in transliteration (tr///)
at /usr/bin/perlindex Show quoted text
> line 232, <IN> line 958. > Malformed UTF-8 character (unexpected non-continuation byte 0xc1, > immediately after start byte 0xc0) in transliteration (tr///) at > /usr/bin/perlindex line 232, <IN> line 958. > Malformed UTF-8 character (unexpected non-continuation byte 0xc2, > immediately after start byte 0xc1) in transliteration (tr///) at > /usr/bin/perlindex line 232, <IN> line 958. > Malformed UTF-8 character (unexpected non-continuation byte 0xc3, > immediately after start byte 0xc2) in transliteration (tr///) at > /usr/bin/perlindex line 232, <IN> line 958. > Malformed UTF-8 character (byte 0xfe) in transliteration (tr///) at > /usr/bin/perlindex line 232, <IN> line 958. > Malformed UTF-8 character (byte 0xff) in transliteration (tr///) at > /usr/bin/perlindex line 232, <IN> line 958. > Malformed UTF-8 character (overflow at 0xc41461c8, byte 0xc9, after > start byte 0xbf) in transliteration (tr///) at /usr/bin/perlindex
line Show quoted text
> 233, <IN> line 958. > Malformed UTF-8 character (overflow at 0xfb280832, byte 0x37, after > start byte 0xff) in transliteration (tr///) at /usr/bin/perlindex
line Show quoted text
> 233, <IN> line 958.
perlindex is not UTF-8 aware, it was written in the early ninties. I don't think that this can easily be fixed. I will not work on it unless I need it. Patches welcome ;-) I assume you are running Perl 5.8.0 on a Redhat Linux system. Your locale probably is an UTF-8 one. This version of Perl reads all binary files in UTF-8 mode. The problem is a general Perl one, not an perindex one. Ulrich