On 2011-12-27 01:31:33, gypark@gmail.com wrote:
Show quoted text> I looked into the error message and I found that the original username
> is changed to be an illegal byte sequence:
> * from 0x ba ce be fb c0 cc (3character, 6bytes, encoded with cp949)
> * to 0x ba ce be 5f c0 cc
...
Show quoted text> or, equivalent C-code in mktmpdir.c:
>
> /* replace all non-alphanumeric letters with '_' */
> for ( c = username ; *c != '\0' ; c++ ) {
> if ( !isalnum(*c) ) {
> *c = '_';
> }
> }
Excellent analysis.
There's actually a problem with the C version: the type of c is char*
and char is signed on Intel i386. At least the Mingw32 implementation
of isalnum() doesn't correctly account for that and marks
only one of the 6 bytes above as !isalnum. It should've marked
them all, resulting in "______". That wouldn't have caused an error
when creating the cache directory, but has a high probability for
collision with another username (though that wouldn't have
shown in a default Windows environment where the temp directory
is per-user anyway).
Show quoted text> Would you please check it? I think that it would be good idea to use "%-
> encoding" for all non-latin characters, so that the username in the
> screenshot would be changed into "par-%ba%ce%be%fb%c0%cc". ('%' may be
> removed)
Yeah, but what is a "non-latin" character if we don't consider
the charset? I think we shouldn't make any assumption about it
and simply encode _all_ bytes unconditionally in username
as two hex bytes. The only thing we loose is easy recognizability
of which cache directory belongs to which user.
From cursory looking at CP949 (or EUC-KR) I believe that the
sequence of bytes "par-bacebefbc0cc" is a legal string in CP949, right?
That should also work with ASCII and all ISO Latin encodings,
as well as EUC-CN and EUC-JN and UTF-8.
Could you please try the two attached patches (the first is for PAR,
the second for PAR::Packer)?
Cheers, Roderich