Show quoted text> The way I interpret this chars in the 127..255 range are not "extended Unicode characters"
Ok, get it. But imho, characters in range 127..255 are not something special. It's something special for Perl, yes (latin-1 is default encoding). And maybe for English speakers.
For me it's just Latin-1 characters, which have codepoints < 255 in Unicode (which however does not mean that they fit 1 byte in most of commonly used unicode encodings).
Show quoted text> so I should probably rephrase to make it clearer
Ok, thanks!
Show quoted text> I think that explanation is a bit too much.
Maybe.
Show quoted text> I don't always think more documentation is better
Yes, Agree.
Show quoted text> misconception that the internal encoding of chars has semantic meaning
Maybe it has, for functions which work with bytes.
For me it's misconception that character with code 200 (Unicode) is 1 octet with code 200. It's true only if character is Latin-1 encoded.
On Thu May 23 01:11:08 2013, GAAS wrote:
Show quoted text> On Wed May 22 13:32:43 2013, vsespb wrote:
> > > Perl v5.8 and better allow extended Unicode characters in strings.
> > Such strings cannot be
> > > encoded directly, as the base64 encoding is only defined for
> single-
> > byte characters. The
> > > solution is to use the Encode module to select the byte encoding
> you
> > want.
> >
> > In fact Unicode characters > 127 and < 255 can be encoded with
> > MIME::Base64, they will be autoconverted to Latin-1 single-byte
> > encoding.
>
> The way I interpret this chars in the 127..255 range are not "extended
> Unicode
> characters", so in that way the statement is correct. You obviously
> interpret
> it to mean non-ASCII, so I should probably rephrase to make it
> clearer.
>
> > Digest::SHA module has same behaviour with Unicode, as MIME::Base64,
> > and it's documentation is a bit better explains how it will work
> >
http://search.cpan.org/~mshelor/Digest-SHA-
> > 5.84/lib/Digest/SHA.pm#UNICODE_AND_SIDE_EFFECTS
>
> I think that explanation is a bit too much. I don't always think more
> documentation is better. I don't want to contribute to the
> misconception that
> the internal encoding of chars has semantic meaning.