Bug #49744 for libintl-perl: gettext_pp.pm can not parse Russian plural forms

Wed Sep 16 11:49:51 2009 STEFFENW [...] cpan.org - Ticket created

Subject:

gettext_pp.pm can not parse Russian plural forms

The russian plural forms: "Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n% Show quoted text

10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"

To fix this problem change gettext_pp.pm at line 695 .. 696 from: ---------------------------------------------------------------------- $code =~ s/([^_a-zA-Z0-9]|\A)([_a-z][_A-Za-z0-9]*)([^_a- zA-Z0-9])/$1\$$2$3/g; ---------------------------------------------------------------------- to: ---------------------------------------------------------------------- $code =~ s/\b n \b/\$n/xg; ----------------------------------------------------------------------

Wed Sep 16 18:00:06 2009 STEFFENW [...] cpan.org - Correspondence added

Oh I realize that: ---------------------------------------------------------------------- $code =~ s/ ( \b (?: nplurals | plural | n ) \b ) /\$$1/xg; ---------------------------------------------------------------------- was not a good idea because of tainted code. So I have experimented and run this now: ---------------------------------------------------------------------- my $code = $domain->{po_header}->{plural_forms}; PARSE: { $code =~ s{ ( [^a-z] ) | ( \b or \b ) | ( \b (?: nplurals | plural | n ) \b ) | (.) }{defined $1 ? $1 : defined $2 ? $2 : defined $3 ? "\$$3" : last PARSE}xmsge; # untaint $code =~ m{\A (.*) \z}xms; $code = $1; } ---------------------------------------------------------------------- I have no idea why my full substiute does not reset the taint flag and your code have done. So I invented the $4 workaround.

Wed Sep 16 18:00:10 2009 STEFFENW [...] cpan.org - Status changed from 'new' to 'open'

Thu Sep 17 00:59:19 2009 STEFFENW [...] cpan.org - Correspondence added

I want to explain my regex: [^a-z] Does not match barwords like 'open' or something else. This matches all chars like '=', '%', '!=', '?', ':' and such one. Maybe a positive regex would be better. \b or \b 'or' should be allowed, read: http://translate.sourceforge.net/wiki/l10n/pluralforms \b (?: nplurals | plural | n ) \b The \b (word boundary) allows the match for e.g. the word 'n' exactly and matches not at 'not'.

Thu Sep 17 03:22:55 2009 GUIDO [...] cpan.org - Taken

Wed Jan 11 12:51:17 2012 GUIDO [...] cpan.org - Correspondence added

The easiest fix is to use the syntax normally used for Russian: nplurals=3; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2) gettext_pp.pm has also been fixed (see commit 227729165fe63201c4144898784cbecec0dd6e2a). I didn't use your proposed patch because I \b is locale-dependent. BTW, sorry for the “delay“. I couldn't reproduce the bug because I always tested with the alternative form with the additional parentheses (see above). And then it slipped through between other activities.

Wed Jan 11 12:51:18 2012 GUIDO [...] cpan.org - Status changed from 'open' to 'resolved'