Bug #85547 for MIME-Base64: Documentation about Unicode strings is vague

Wed May 22 13:32:43 2013 victor [...] vsespb.ru - Ticket created

Subject:

Documentation about Unicode strings is vague

Show quoted text

> Perl v5.8 and better allow extended Unicode characters in strings. Such strings cannot be > encoded directly, as the base64 encoding is only defined for single-byte characters. The > solution is to use the Encode module to select the byte encoding you want.

In fact Unicode characters > 127 and < 255 can be encoded with MIME::Base64, they will be autoconverted to Latin-1 single-byte encoding. Digest::SHA module has same behaviour with Unicode, as MIME::Base64, and it's documentation is a bit better explains how it will work http://search.cpan.org/~mshelor/Digest-SHA-5.84/lib/Digest/SHA.pm#UNICODE_AND_SIDE_EFFECTS

Wed May 22 17:11:08 2013 GAAS [...] cpan.org - Correspondence added

On Wed May 22 13:32:43 2013, vsespb wrote:
Show quoted text

> > Perl v5.8 and better allow extended Unicode characters in strings.

> Such strings cannot be

> > encoded directly, as the base64 encoding is only defined for single-

> byte characters. The

> > solution is to use the Encode module to select the byte encoding you

> want.
>
> In fact Unicode characters > 127 and < 255 can be encoded with
> MIME::Base64, they will be autoconverted to Latin-1 single-byte
> encoding.

The way I interpret this chars in the 127..255 range are not "extended Unicode characters", so in that way the statement is correct. You obviously interpret it to mean non-ASCII, so I should probably rephrase to make it clearer.

Show quoted text

> Digest::SHA module has same behaviour with Unicode, as MIME::Base64,
> and it's documentation is a bit better explains how it will work
> http://search.cpan.org/~mshelor/Digest-SHA-
> 5.84/lib/Digest/SHA.pm#UNICODE_AND_SIDE_EFFECTS

I think that explanation is a bit too much. I don't always think more documentation is better. I don't want to contribute to the misconception that the internal encoding of chars has semantic meaning.

Wed May 22 17:11:08 2013 The RT System itself - Status changed from 'new' to 'open'

Wed May 22 17:37:58 2013 victor [...] vsespb.ru - Correspondence added

From:

victor [...] vsespb.ru

Show quoted text

> The way I interpret this chars in the 127..255 range are not "extended Unicode characters"

Ok, get it. But imho, characters in range 127..255 are not something special. It's something special for Perl, yes (latin-1 is default encoding). And maybe for English speakers. For me it's just Latin-1 characters, which have codepoints < 255 in Unicode (which however does not mean that they fit 1 byte in most of commonly used unicode encodings). Show quoted text

> so I should probably rephrase to make it clearer

Ok, thanks! Show quoted text

> I think that explanation is a bit too much.

Maybe. Show quoted text

> I don't always think more documentation is better

Yes, Agree. Show quoted text

> misconception that the internal encoding of chars has semantic meaning

Maybe it has, for functions which work with bytes. For me it's misconception that character with code 200 (Unicode) is 1 octet with code 200. It's true only if character is Latin-1 encoded. On Thu May 23 01:11:08 2013, GAAS wrote: Show quoted text

> On Wed May 22 13:32:43 2013, vsespb wrote:

> > > Perl v5.8 and better allow extended Unicode characters in strings.

> > Such strings cannot be

> > > encoded directly, as the base64 encoding is only defined for

> single-

> > byte characters. The

> > > solution is to use the Encode module to select the byte encoding

> you

> > want. > > > > In fact Unicode characters > 127 and < 255 can be encoded with > > MIME::Base64, they will be autoconverted to Latin-1 single-byte > > encoding.

> > The way I interpret this chars in the 127..255 range are not "extended > Unicode > characters", so in that way the statement is correct. You obviously > interpret > it to mean non-ASCII, so I should probably rephrase to make it > clearer. >

> > Digest::SHA module has same behaviour with Unicode, as MIME::Base64, > > and it's documentation is a bit better explains how it will work > > http://search.cpan.org/~mshelor/Digest-SHA- > > 5.84/lib/Digest/SHA.pm#UNICODE_AND_SIDE_EFFECTS

> > I think that explanation is a bit too much. I don't always think more > documentation is better. I don't want to contribute to the > misconception that > the internal encoding of chars has semantic meaning.

Mon Jan 12 15:19:55 2015 GAAS [...] cpan.org - Correspondence added

I'll reconsider if you provide a patch :-)

Mon Jan 12 15:20:01 2015 GAAS [...] cpan.org - Status changed from 'open' to 'rejected'