Skip Menu |

This queue is for tickets about the libintl-perl CPAN distribution.

Report information
The Basics
Id: 49744
Status: resolved
Priority: 0/
Queue: libintl-perl

People
Owner: GUIDO [...] cpan.org
Requestors: STEFFENW [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: 1.20
Fixed in: (no value)



Subject: gettext_pp.pm can not parse Russian plural forms
The russian plural forms: "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n% Show quoted text
10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
To fix this problem change gettext_pp.pm at line 695 .. 696 from: ---------------------------------------------------------------------- $code =~ s/([^_a-zA-Z0-9]|\A)([_a-z][_A-Za-z0-9]*)([^_a- zA-Z0-9])/$1\$$2$3/g; ---------------------------------------------------------------------- to: ---------------------------------------------------------------------- $code =~ s/\b n \b/\$n/xg; ----------------------------------------------------------------------
Oh I realize that: ---------------------------------------------------------------------- $code =~ s/ ( \b (?: nplurals | plural | n ) \b ) /\$$1/xg; ---------------------------------------------------------------------- was not a good idea because of tainted code. So I have experimented and run this now: ---------------------------------------------------------------------- my $code = $domain->{po_header}->{plural_forms}; PARSE: { $code =~ s{ ( [^a-z] ) | ( \b or \b ) | ( \b (?: nplurals | plural | n ) \b ) | (.) }{defined $1 ? $1 : defined $2 ? $2 : defined $3 ? "\$$3" : last PARSE}xmsge; # untaint $code =~ m{\A (.*) \z}xms; $code = $1; } ---------------------------------------------------------------------- I have no idea why my full substiute does not reset the taint flag and your code have done. So I invented the $4 workaround.
I want to explain my regex: [^a-z] Does not match barwords like 'open' or something else. This matches all chars like '=', '%', '!=', '?', ':' and such one. Maybe a positive regex would be better. \b or \b 'or' should be allowed, read: http://translate.sourceforge.net/wiki/l10n/pluralforms \b (?: nplurals | plural | n ) \b The \b (word boundary) allows the match for e.g. the word 'n' exactly and matches not at 'not'.
The easiest fix is to use the syntax normally used for Russian: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2) gettext_pp.pm has also been fixed (see commit 227729165fe63201c4144898784cbecec0dd6e2a). I didn't use your proposed patch because I \b is locale-dependent. BTW, sorry for the “delay“. I couldn't reproduce the bug because I always tested with the alternative form with the additional parentheses (see above). And then it slipped through between other activities.