Subject: | Weird utf8 bug breaks header parsing? |
Dunno where/if this problem should be fixed, but it seems that IO::
Scalar does some weird stuff with utf-8, and it manifests itself in
MIME::Parser.
----------------------
#!/usr/bin/perl
use strict;
use MIME::Parser;
my $parser = new MIME::Parser;
$parser->output_to_core('YES');
$parser->tmp_to_core('YES');
my $tst = <<'EOF';
Content-Transfer-Encoding: binary
Content-Type: multipart/alternative; boundary="badger"
MIME-Version: 1.0
From: from@example.com
To: to@example.com
This is a multi-part message in MIME format.
--badger
Content-Disposition: inline
Content-Length: 100
Content-Transfer-Encoding: binary
Content-Type: text/html
<html>
£
</html>
--badger--
EOF
{
my $entity = eval { $parser->parse_data( $tst ) };
my $error = ($@ || $parser->last_error);
print STDERR "error: ($error)\n" if $error;
$entity->dump_skeleton;
}
utf8::upgrade($tst);
{
my $entity = eval { $parser->parse_data( $tst ) };
my $error = ($@ || $parser->last_error);
print STDERR "error: ($error)\n" if $error;
$entity->dump_skeleton;
}
----------------------
output:
Content-type: multipart/alternative
Effective-type: multipart/alternative
Body-file: NONE
Num-parts: 1
--
Content-type: text/html
Effective-type: text/html
Body-file: NONE
--
error: (error: couldn't parse head; error near:
CCMFT
)
Content-type: text/plain
Effective-type: text/plain
Body-file: NONE
--
Basically, the header lines are being read in as single characters if
the string scalar is flagged as utf8 - hence the CCMFT error - the first
characters of the headers are C,C,M,F and T.
Just thought it was worth reporting in case anyone else bumps into this.
As you can imagine, it was fun to track down :-)
install is an up to date rhel3. Previous experience has a big finger
pointing at redhat, so it would be nice to see if anyone else gets the
same error when running the script.
perl -v:
This is perl, v5.8.0 built for i386-linux-thread-multi
(with 1 registered patch, see perl -V for more detail)
(-V gives
Compile-time options: DEBUGGING MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
Locally applied patches:
MAINT18379
)
uname -a
Linux 2.4.21-37.0.1.ELsmp #1 SMP Wed Jan 11 18:44:17 EST 2006 i686 i686
i386 GNU/Linux