On Thu, 16 Aug 2018 05:11:14 -0400
"Roderich Schupp via RT" <bug-PAR-Packer@rt.cpan.org> wrote:
Show quoted text> <URL:
https://rt.cpan.org/Ticket/Display.html?id=126280 >
>
> On 2018-08-15 19:56:57, XENU wrote:
> > "\357\277\275" is a REPLACEMENT CHARACTER. It seems that when the UTF-
> > 8 checkbox is enabled, bytes that aren't valid UTF-8 are being
> > replaced with that character. "\x{85}" obviously isn't a valid UTF-8
> > character.
>
> Nope, "\x{85}" is a valid Unicode code point (there's no such thing as a
> "UTF-8 character"), cf.
http://www.unicode.org/charts/PDF/U0080.pdf
Of course U+0085 exists, but it's irrelevant because in this case we're
talking about raw bytes. And by "UTF-8 character" I meant "UTF-8 encoded
codepoint". "\xc2\x85" (or Encode::encode("UTF-8", "\x85")) would work
fine, I have tested that.
Show quoted text> For backgroud information, we're in a murky Windows area here:
> when you call the C-level function (somewhere in the guts of PAR::Packer)
>
> spawnvp(P_WAIT, "some.exe", argv)
>
> you have to actually manipulate the strings in argv[] so that some.exe
> actually sees the original argv in its
>
> main(argc, argv)
>
> The most obvious gotcha is when some argv[i] contains blanks, e.g.
> "foo bar quux", which will arrive at some.exe as *three* separate elements of argv[],
> "foo", "bar", "quux". See Win32::ShellQuote for details, that's where I stole
> most of the test cases from.
>
> Anyway, a 100% solution is probably not possible and "\x{85}", while legal Unicode,
> isn't a very relevant test case - it's a control char ("NEXT LINE"). So there may
> be a reason why Microsoft treats it differently under "Use Unicode UTF-8 for worldwide language support".
> Let's replace this test case with some more relevant cases uses of strings
> with non-ASCII chars:
>
> [ qq[german umlaute \x{E4}\x{F6}\x{FC}] ],
> [ qq[chinese zhongwen \x{4E2D}\{6587}] ],
>
> Can you rerun the failing test with these modifications under "Use Unicode..."?
Both of them fail:
ok 110 - successfully ran "C:\Users\xenu\AppData\Local\Temp\qn5gz65wHX\packed.exe german umlaute "
not ok 111
# Failed test at t\90-rt122949.t line 79.
# got: '$VAR1 = [
# "german umlaute \357\277\275\357\277\275\357\277\275"
# ];
# '
# expected: '$VAR1 = [
# "german umlaute \344\366\374"
# ];
# '
Wide character in print at C:/Strawberry/perl/lib/Test2/Formatter/TAP.pm line 144.
ok 112 - successfully ran "C:\Users\xenu\AppData\Local\Temp\qn5gz65wHX\packed.exe chinese zhongwen ??"
not ok 113
# Failed test at t\90-rt122949.t line 79.
# got: '$VAR1 = [
# "chinese zhongwen \344\270\255\346\226\207"
# ];
# '
# expected: '$VAR1 = [
# "chinese zhongwen \x{4e2d}\x{6587}"
# ];
# '
# Looks like you failed 2 tests of 113.
However, if I replace them with qq[german umlaute
\xc3\xa4\xc3\xb6\xc3\xbc] and qq[chinese zhongwen
\xe4\xb8\xab\xe6\x96\x87] the test passes.
Show quoted text>
> Cheers, Roderich