Bug #73491 for PAR-Packer: cache directory naming problem

Tue Dec 27 01:31:33 2011 gypark [...] gmail.com - Ticket created

Subject:

cache directory naming problem

Hello, I'm using: * MS Windows XP ServicePack3 32bit * Strawberry Perl 5.12.3 * PAR::Packer 1.012 I made a simple Perl script, and made a .exe file using pp -o app.exe app.pl This app.exe works very well when the username(of windows system) is in English. However, it happens to fail to create a cache directory if the username is a sort of string of Korean characters. I attached the screenshot, hoping it would be helpful to you. I looked into the error message and I found that the original username is changed to be an illegal byte sequence: * from 0x ba ce be fb c0 cc (3character, 6bytes, encoded with cp949) * to 0x ba ce be 5f c0 cc I guess this is because the following code in SetupTemp.pm: $username =~ s/\W/_/g; or, equivalent C-code in mktmpdir.c: /* replace all non-alphanumeric letters with '_' */ for ( c = username ; *c != '\0' ; c++ ) { if ( !isalnum(*c) ) { *c = '_'; } } doesn't consider Non-latin characters. Would you please check it? I think that it would be good idea to use "%- encoding" for all non-latin characters, so that the username in the screenshot would be changed into "par-%ba%ce%be%fb%c0%cc". ('%' may be removed) One other idea is to replace 'every bytes in non-ascii range' with '_', but this may change many different non-latin usernames of same length to same string, eg. "______". Thank you for this good module. Sincerly, Geunyoung Park from South Korea P.S. This problem is issued initialy in a Korean perl user community by "@owl0908", not me. His original blog post is here (written in Korean): http://www.dormouse.pe.kr/blogtool/article/410/

Subject:

packer.png

Download packer.png
image/x-png 25.5k

Wed Dec 28 03:18:53 2011 RSCHUPP [...] cpan.org - Status changed from 'new' to 'open'

Wed Dec 28 03:18:57 2011 RSCHUPP [...] cpan.org - Taken

Wed Dec 28 06:49:51 2011 RSCHUPP [...] cpan.org - Correspondence added

On 2011-12-27 01:31:33, gypark@gmail.com wrote: Show quoted text

> I looked into the error message and I found that the original username > is changed to be an illegal byte sequence: > * from 0x ba ce be fb c0 cc (3character, 6bytes, encoded with cp949) > * to 0x ba ce be 5f c0 cc

... Show quoted text

> or, equivalent C-code in mktmpdir.c: > > /* replace all non-alphanumeric letters with '_' */ > for ( c = username ; *c != '\0' ; c++ ) { > if ( !isalnum(*c) ) { > *c = '_'; > } > }

Excellent analysis. There's actually a problem with the C version: the type of c is char* and char is signed on Intel i386. At least the Mingw32 implementation of isalnum() doesn't correctly account for that and marks only one of the 6 bytes above as !isalnum. It should've marked them all, resulting in "______". That wouldn't have caused an error when creating the cache directory, but has a high probability for collision with another username (though that wouldn't have shown in a default Windows environment where the temp directory is per-user anyway). Show quoted text

> Would you please check it? I think that it would be good idea to use "%- > encoding" for all non-latin characters, so that the username in the > screenshot would be changed into "par-%ba%ce%be%fb%c0%cc". ('%' may be > removed)

Yeah, but what is a "non-latin" character if we don't consider the charset? I think we shouldn't make any assumption about it and simply encode _all_ bytes unconditionally in username as two hex bytes. The only thing we loose is easy recognizability of which cache directory belongs to which user. From cursory looking at CP949 (or EUC-KR) I believe that the sequence of bytes "par-bacebefbc0cc" is a legal string in CP949, right? That should also work with ASCII and all ISO Latin encodings, as well as EUC-CN and EUC-JN and UTF-8. Could you please try the two attached patches (the first is for PAR, the second for PAR::Packer)? Cheers, Roderich

Subject:

SetupTemp.patch

Index: PAR/lib/PAR/SetupTemp.pm =================================================================== --- PAR/lib/PAR/SetupTemp.pm (revision 1336) +++ PAR/lib/PAR/SetupTemp.pm (working copy) @@ -98,7 +98,11 @@ qw( C:\\TEMP /tmp . ) ) { next unless defined $path and -d $path and -w $path; - $temp_path = File::Spec->catdir($path, "par-$username"); + # create a temp directory that is unique per user + # NOTE: $username may be in an unspecified charset/encoding; + # use a name that hopefully works for all of them; + # also avoid problems with platform-specific meta characters in the name + $temp_path = File::Spec->catdir($path, "par-".unpack("H*", $username)); ($temp_path) = $temp_path =~ /^(.*)$/s; unless (mkdir($temp_path, 0700) || $!{EEXIST}) { warn "creation of private subdirectory $temp_path failed (errno=$!)"; @@ -140,7 +144,6 @@ else { $username = $ENV{USERNAME} || $ENV{USER} || 'SYSTEM'; } - $username =~ s/\W/_/g; return $username; }

Subject:

mktmpdir.patch

Index: PAR-Packer/myldr/mktmpdir.c =================================================================== --- PAR-Packer/myldr/mktmpdir.c (revision 1336) +++ PAR-Packer/myldr/mktmpdir.c (working copy) @@ -76,7 +76,6 @@ char *par_mktmpdir ( char **argv ) { int i; - char *c; const char *tmpdir = NULL; const char *key = NULL , *val = NULL; @@ -112,6 +111,7 @@ DWORD buflen = MAXPATHLEN; username = malloc(MAXPATHLEN); GetUserName((LPTSTR)username, &buflen); + // FIXME this is uncondifionally overwritten below - WTF? } #endif @@ -123,18 +123,17 @@ username = strdup(val); } } - - if ( username == NULL ) { + if ( username == NULL ) username = "SYSTEM"; + + /* sanitize username: encode all bytes as 2 hex digits */ + { + char *hexname = malloc(2 * strlen(username) + 1); + char *u, *h; + for ( u = username, h = hexname ; *u != '\0' ; u++, h += 2) + sprintf(h, "%02x", *(unsigned char*)u); + username = hexname; } - else { - /* replace all non-alphanumeric letters with '_' */ - for ( c = username ; *c != '\0' ; c++ ) { - if ( !isalnum(*c) ) { - *c = '_'; - } - } - } /* Try temp environment variables */ for ( i = 0 ; tmpdir == NULL && (key = temp_keys[i]); i++ ) {

Wed Dec 28 09:57:49 2011 gypark [...] gmail.com - Correspondence added

From:

gypark [...] gmail.com

On Wed Dec 28 06:49:51 2011, RSCHUPP wrote: Show quoted text

>

> > Would you please check it? I think that it would be good idea to

use "%- Show quoted text

> > encoding" for all non-latin characters, so that the username in the > > screenshot would be changed into "par-%ba%ce%be%fb%c0%cc". ('%' may

be Show quoted text

> > removed)

> > Yeah, but what is a "non-latin" character if we don't consider > the charset? I think we shouldn't make any assumption about it > and simply encode _all_ bytes unconditionally in username > as two hex bytes. The only thing we loose is easy recognizability > of which cache directory belongs to which user.

Hmm. When I was saying "non-latin", I was just considering "qr/[^A-Za- z]/" :-) I know about just the basic concept of 'encoding' or 'charset', not detail. So I'd like to agree with whatever you think about how to manage username. Show quoted text

> Could you please try the two attached patches (the first is for PAR, > the second for PAR::Packer)?

Your patch took effect in my environment that was same as my previous report. (same machine, same version of Perl, same username) I attached a screenshot. As it shows, app.exe runs well and I could see a cache folder "par-bacebefbc0cc" was created. :-D Thanks for your rapid reply and patch. Sincerely, Geunyoung Park

Subject:

packer2.png

Download packer2.png
image/x-png 24.5k

Wed Dec 28 10:31:03 2011 RSCHUPP [...] cpan.org - Correspondence added

On 2011-12-28 09:57:49, gypark@gmail.com wrote: Show quoted text

> I attached a screenshot. As it shows, app.exe runs well and I could see > a cache folder "par-bacebefbc0cc" was created. :-D

Thanks for testing. I committed the two patches, will be in the next release of PAR and PAR::Packer. Cheers, Roderich

Wed Dec 28 10:31:04 2011 RSCHUPP [...] cpan.org - Status changed from 'open' to 'resolved'