Subject: | Strings that have + in UTF-7 encoding are not encoded properly |
The character '+' is in the base64 alphabet, so it is perfectly possible for it to be part of an UTF-7 encoded string. For example, the Cyrillic string \x{043f}\x{0440}\x{0435}\x{0434}\x{043b}\x{043e}\x{0433} is represented in UTF-7 as +BD8EQAQ1BDQEOwQ+BDM-
Note the second plus sign 4 characters before the end. IMAPUtf7 encodes the above string as +BD8EQAQ1BDQEOwQ&BDM- which is not valid modified UTF-7 (the ampersand and the plus are swapped).
You can reproduce the bug with this (note that this requires UTF-8 aware terminal):
perl -le 'use Encode; use Unicode::IMAPUtf7; $c=Unicode::IMAPUtf7->new(); $_="\x{043f}\x{0440}\x{0435}\x{0434}\x{043b}\x{043e}\x{0433}";{printf "%s is encoded as %s and decoded back to %s\n", $_, $c->encode($_), Encode::decode("utf8",$c->decode($c->encode($_)))}'
The attached patch fixes this bug and also is a better solution to bug#6909 (single + and & signs are encoded as the same sequence)
--- /usr/lib/perl5/site_perl/5.8.3/Unicode/IMAPUtf7.pm 2004-08-29 23:23:47.000000000 +0300
+++ IMAPUtf7.pm 2005-01-20 21:28:01.543990064 +0200
@@ -113,11 +113,9 @@
# On remplace , par / dans les BASE 64 (, entre & et -)
# On remplace les &, non suivi d'un - par +
# On remplace les &- par &
- $s =~ s/\+/PLUSPLACEHOLDER/g;
$s =~ s/&([^,&\-]*),([^,\-&]*)\-/&$1\/$2\-/g;
$s =~ s/&(?!\-)/\+/g;
$s =~ s/&\-/&/g;
- $s =~ s/PLUSPLACEHOLDER/+-/g;
return $s;
}
@@ -135,11 +133,9 @@
sub _imap_utf7_encode {
my ($s) = @_;
- $s =~ s/\+\-/PLUSPLACEHOLDER/g;
$s =~ s/\+([^\/&\-]*)\/([^\/\-&]*)\-/\+$1,$2\-/g;
$s =~ s/&/&\-/g;
- $s =~ s/\+([^+\-]+)?\-/&$1\-/g;
- $s =~ s/PLUSPLACEHOLDER/+/g;
+ $s =~ s/\+([A-za-z0-9\+,]+)\-/&$1\-/g;
return $s;
}