Skip Menu |

This queue is for tickets about the Encode CPAN distribution.

Report information
The Basics
Id: 18567
Status: resolved
Priority: 0/
Queue: Encode

People
Owner: Nobody in particular
Requestors: michael [...] email4all.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 2.14
Fixed in: (no value)



Subject: gsm0338 encode malfunction
decode gsm0338 correctly translates alabic alef (0B5F) to (D8A7), but encode gsm0338 translates (D8A7) to (0B). ---- Can you add support for Greek capitals? ---- Where can I find the spec to say which characters can be sent in gsm0338? Does this list change based upon language selection? ---- This is perl, v5.8.7 built for i486-linux-gnu-thread-multi (with 1 registered patch, see perl -V for more detail) Linux 2.6.12-10-386 #1 Mon Feb 13 12:13:15 UTC 2006 i686 GNU/Linux
On Thu Apr 06 11:02:08 2006, guest wrote: Show quoted text
> decode gsm0338 correctly translates alabic alef (0B5F) to (D8A7), but > encode gsm0338 translates (D8A7) to (0B). > ---- > Can you add support for Greek capitals? > ---- > Where can I find the spec to say which characters can be sent in > gsm0338? Does this list change based upon language selection? > ---- > This is perl, v5.8.7 built for i486-linux-gnu-thread-multi > (with 1 registered patch, see perl -V for more detail) > Linux 2.6.12-10-386 #1 Mon Feb 13 12:13:15 UTC 2006 i686 GNU/Linux
Strange. alabic alef does not even exist in gsm0338. The one used in Encode.pm is based upon http://www.unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT Would you explain in more details? Dan the Encode Maintainer
Subject: Re: [rt.cpan.org #18567] gsm0338 encode malfunction
Date: Fri, 07 Apr 2006 12:43:11 +0300
To: bug-Encode [...] rt.cpan.org
From: Michael Virgo <michael [...] email4all.org>
Hi Dan, It gets more puzzling the more I investigate, but first to isolate the bug specifically: $msgtxt = chr(0xa7); $msgtxt = Encode::encode ("gsm0338", $msgtxt); $msgtxt is then "" not chr(0x5F) as I would expect from the table you sent: 0x5F 0x00A7 # SECTION SIGN Now about the Arabic. My goal is to write an SMS gateway that will cope with as many languages as possible, and especially Arabic. So I got a friend to send me an Arabic GSM message. He started with one containing a single alef. This arrived in GSM 7 bit format as 0x0b5f. Decoding that with gsm0338 it becomes 0xd8a7. When printed into the terminal this displays as a vertical bar (the correct shape for the letter alef). I spent ages trying to work out how that was an alef (not being familiar with utf8), and finally discovered that if I ran $msgtxt = Encode::decode_utf8 ($msgtxt); it was translated to \x0627 an Arabic alef. Fine, but there was a bug in the version of Encode I was using so I installed version 2.14. Now the process above no longer works. It only works if the utf8 flag is off. I don't understand this. If perl's internal format is utf8, I would not necessarily expect Encode::decode_utf8 to do anything. But since it used to translate 0xd8a7 from utf8 to \x0627 which I think is correctly called ucs2 I would expect it to be equivalent to $msgtxt = Encode::encode ("UCS2", $msgtxt); but that doesn't make any change to my utf8 string 0xd8a7 whether or not the utf8 flag is set. Please can you explain this? What I want is to be able to translate the GSM into utf8, then translate the utf8 to ucs2 (0x0b5f -> 0xd8a7 -> 0x0627). Shouldn't there be a perl way of doing this without having to adjust the utf8 flag? I would also like to be able to do the reverse translation. Thanks for your help, Michael On Thu, 2006-04-06 at 11:34 -0400, via RT wrote: Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=18567 > > > On Thu Apr 06 11:02:08 2006, guest wrote:
> > decode gsm0338 correctly translates alabic alef (0B5F) to (D8A7), but > > encode gsm0338 translates (D8A7) to (0B). > > ---- > > Can you add support for Greek capitals? > > ---- > > Where can I find the spec to say which characters can be sent in > > gsm0338? Does this list change based upon language selection? > > ---- > > This is perl, v5.8.7 built for i486-linux-gnu-thread-multi > > (with 1 registered patch, see perl -V for more detail) > > Linux 2.6.12-10-386 #1 Mon Feb 13 12:13:15 UTC 2006 i686 GNU/Linux
> > Strange. alabic alef does not even exist in gsm0338. The one used in Encode.pm is based > upon > > http://www.unicode.org/Public/MAPPINGS/ETSI/GSM0338.TXT > > Would you explain in more details? > > Dan the Encode Maintainer > >
On Fri Apr 07 05:42:32 2006, michael@email4all.org wrote: Show quoted text
> It gets more puzzling the more I investigate, but first to isolate the > bug specifically: > > $msgtxt = chr(0xa7); > $msgtxt = Encode::encode ("gsm0338", $msgtxt);
encode()? not decode()? Unless you 'use utf8', chr(0xa7) will be treated as ISO-Latin, not UTF8 I've got a feeling you misused Encode and perl unicode rather than found a bug. I'll close this ticket for the time being. Please read perlunicode and perluniintro (and perlunitut if you have bleedperl handy). If you still encounter the bug, give me a mail BEFORE issuing a ticket via RT. Dan the Encode Maintainer
Subject: Re: [rt.cpan.org #18567] gsm0338 encode malfunction
Date: Thu, 5 Oct 2006 10:25:59 -0000 (GMT)
To: bug-Encode [...] rt.cpan.org
From: michael [...] email4all.org
Hi Dan, Sorry for the delay, I've been away for a while. Please try this code. Why doesn't encode do the reverse operation from decode for these 6 characters? #!/usr/bin/perl # # Simple test program that executes encode and decode gsm338 # # ****************************************** require Encode; for ($i=0;$i<128;$i++) { $gsmtxt = chr($i); $msgtxt = Encode::decode("gsm0338", $gsmtxt); $msgord = ord($msgtxt); $ngsmtxt = Encode::encode("gsm0338", $msgtxt); $ngsmord = ord($ngsmtxt); if (($i != $ngsmord) and ($i != 0x1b)) { printf "%4x%4x", $i,$msgord; #print " $msgtxt"; printf "%4x\n", $ngsmord; } } Michael Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=18567 > > > On Fri Apr 07 05:42:32 2006, michael@email4all.org wrote:
>> It gets more puzzling the more I investigate, but first to isolate the >> bug specifically: >> >> $msgtxt = chr(0xa7); >> $msgtxt = Encode::encode ("gsm0338", $msgtxt);
> > encode()? not decode()? > Unless you 'use utf8', chr(0xa7) will be treated as ISO-Latin, not UTF8 > I've got a feeling you misused Encode and perl unicode rather than found a > bug. > > I'll close this ticket for the time being. Please read perlunicode and > perluniintro (and > perlunitut if you have bleedperl handy). If you still encounter the bug, > give me a mail BEFORE > issuing a ticket via RT. > > Dan the Encode Maintainer > >
On Thu Oct 05 06:26:36 2006, michael@email4all.org wrote: Show quoted text
> Hi Dan, > > Sorry for the delay, I've been away for a while. > > Please try this code. Why doesn't encode do the reverse operation from > decode for these 6 characters? > > #!/usr/bin/perl > # > # Simple test program that executes encode and decode gsm338 > # > # ****************************************** > require Encode; > > for ($i=0;$i<128;$i++) > { $gsmtxt = chr($i); > $msgtxt = Encode::decode("gsm0338", $gsmtxt);
decode? not encode? Here you are treating $i as GSM character, not UTF-8 Character. This code does not make sense to me. Show quoted text
> $msgord = ord($msgtxt); > $ngsmtxt = Encode::encode("gsm0338", $msgtxt); > $ngsmord = ord($ngsmtxt); > if (($i != $ngsmord) and ($i != 0x1b)) > { printf "%4x%4x", $i,$msgord; > #print " $msgtxt"; > printf "%4x\n", $ngsmord; > } > } > > Michael
Till you convince me it's Encode's bug, not your misunderstanding, I'll close this ticket. Please open a new ticket if you find a new evidence. Dan the Encode Maintainer
Subject: Re: [rt.cpan.org #18567] gsm0338 encode malfunction
Date: Sat, 7 Apr 2007 08:18:22 -0000 (GMT)
To: bug-Encode [...] rt.cpan.org
From: michael [...] email4all.org
Hi Dan, Yes, I do want to start from incoming SMS messages, therefore GSM characters. If its my misunderstanding that means I can't convert GSM characters into Perl format and back again without having to tweek internal flags in perl I think its time I found a more sane language to write in! Seriously, please will you tell me what information I need to add to my GSM characters to tell decode to convert them in such a way that encode can reverse the process? Thanks, Michael Show quoted text
> <URL: http://rt.cpan.org/Ticket/Display.html?id=18567 > > > On Thu Oct 05 06:26:36 2006, michael@email4all.org wrote:
>> Hi Dan, >> Sorry for the delay, I've been away for a while. >> Please try this code. Why doesn't encode do the reverse operation from
decode for these 6 characters? Show quoted text
>> #!/usr/bin/perl >> # >> # Simple test program that executes encode and decode gsm338 >> # >> # ****************************************** >> require Encode; >> for ($i=0;$i<128;$i++) >> { $gsmtxt = chr($i); >> $msgtxt = Encode::decode("gsm0338", $gsmtxt);
> > decode? not encode? Here you are treating $i as GSM character, not
UTF-8 Show quoted text
> Character. This > code does not make sense to me. >
>> $msgord = ord($msgtxt); >> $ngsmtxt = Encode::encode("gsm0338", $msgtxt); >> $ngsmord = ord($ngsmtxt); >> if (($i != $ngsmord) and ($i != 0x1b)) >> { printf "%4x%4x", $i,$msgord; >> #print " $msgtxt"; >> printf "%4x\n", $ngsmord; >> } >> } >> Michael
> > Till you convince me it's Encode's bug, not your misunderstanding, I'll
close this ticket. Show quoted text
> Please open a new ticket if you find a new evidence. > > Dan the Encode Maintainer >