Skip Menu |

This queue is for tickets about the Mail-Box CPAN distribution.

Report information
The Basics
Id: 52754
Status: resolved
Priority: 0/
Queue: Mail-Box

People
Owner: Nobody in particular
Requestors: icestar [...] inbox.ru
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 2.092
Fixed in: (no value)



Subject: Cannot decode string with wide characters at Mail::Message::Body::Encode
I've got an error: "Cannot decode string with wide characters at /usr/ lib64/perl5/site_perl/5.8.8/Mail/Message/Body/Encode.pm line 94." Below there is a simple example to reproduce the error: my $text = 'тест'; $text = Encode::decode_utf8($text); # Now this is a real utf-8 string "\x{442}\x{435}\x{441}\x{442}" my $body = Mail::Message::Body->new( data => $text, mime_type => 'text/plain', charset => 'utf-8', ); # trying to add signature $body = $body->concatenate($body, "-- \n", 'signature'); And here I get the error: Cannot decode string with wide characters at /usr/lib64/perl5/ site_perl/5.8.8/Mail/Message/Body/Encode.pm line 94.
CC: undisclosed-recipients: ;
Subject: Re: [rt.cpan.org #52754] Cannot decode string with wide characters at Mail::Message::Body::Encode
Date: Mon, 14 Dec 2009 22:45:37 +0100
To: Dmitry Bigunyak via RT <bug-Mail-Box [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Dmitry Bigunyak via RT (bug-Mail-Box@rt.cpan.org) [091214 12:13]: Show quoted text
> Mon Dec 14 07:13:22 2009: Request 52754 was acted upon. > Transaction: Ticket created by Alien > Queue: Mail-Box > Subject: Cannot decode string with wide characters at Mail::Message::Body::Encode
Show quoted text
> Below there is a simple example to reproduce the error: > Cannot decode string with wide characters at /usr/lib64/perl5/ > site_perl/5.8.8/Mail/Message/Body/Encode.pm line 94.
The problem is again the same issue. The body contains message data, but not always in an encoding which the Mail::Message accepts. Now you create a body from perl data... so it is in "raw" form. charset => 'utf-8' means that the data content of the body is utf-8 represented as bytes. The problem is that in Perl a string can either contain latin1 characters or binary bytes. A message body must contain bytes when the message is sent. A stand-alone body has more freedom: either "bytes in a charset" or real (latin1 or utf8) PERL string So, when you change charset => 'utf-8' into charset => 'PERL' it works, because your data in in Perl internal representation. But you can also change $text = Encode::decode_utf8($text); into $text = Encode::encode('utf-8', $text); Now, $text is in bytes. Encodings are horror. -- MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
Subject: Re: [rt.cpan.org #52754] Cannot decode string with wide characters at Mail::Message::Body::Encode
Date: Tue, 15 Dec 2009 16:05:57 +0300
To: bug-Mail-Box [...] rt.cpan.org
From: Dmitry Bigunyak <icestar [...] inbox.ru>
Show quoted text
> The problem is again the same issue. The body contains message data, > but not always in an encoding which the Mail::Message accepts. Now you > create a body from perl data... so it is in "raw" form. > > charset => 'utf-8' means that the data content of the body is utf-8 > represented as bytes. The problem is that in Perl a string can either > contain latin1 characters or binary bytes. A message body must contain > bytes when the message is sent. A stand-alone body has more freedom: > either "bytes in a charset" or real (latin1 or utf8) PERL string > > So, when you change > charset => 'utf-8' > into charset => 'PERL' it works, > because your data in in Perl internal representation. > > But you can also change > $text = Encode::decode_utf8($text); > into $text = Encode::encode('utf-8', $text); > Now, $text is in bytes. > > Encodings are horror. >
Agree with you, this is a real horror! For me this is the most difficult part for understanding in your project and I think it should be better described in documentation. But now I have another magic problem. I attach the example script to describe the problem. I understand your solutions and they work fine, but in my project code I use Config::General module to get parameter charset and open config file with -UTF8 flag on. This is emulated with $charset = decode_utf8($charset); code in the example script. In this situation something goes wrong and I get bad text in arrived message. If I comment $charset = decode_utf8($charset); string everything is repaired. I can't explain what is happening... To run example script change from, to addresses and smtp server parameters. -- Dima: Nosce te ipsum e-mail: icestar@inbox.ru
#!/usr/bin/perl use strict; use warnings; use Mail::Message; use Mail::Message::Field::Full; use Mail::Transport::SMTP; use Data::Dumper; use Encode; my $subject = 'Это длинный заголовок тестового письма!'; my $charset = 'utf-8'; $charset = decode_utf8($charset); $subject = decode_utf8($subject); my $from_name = 'Дима'; $from_name = Encode::decode_utf8($from_name); my $from_address = Mail::Message::Field::Address->new( address => 'dima@mailbox.ru', phrase => $from_name, charset => $charset, encoding => 'B', ); my $to_name = 'Your address'; $to_name = Encode::decode_utf8($to_name); my $to_address = Mail::Message::Field::Address->new( address => 'test@mailbox.ru', phrase => $to_name, charset => $charset, encoding => 'B', ); my $head = Mail::Message::Head->build( Mail::Message::Field::Full->new('from' => $from_address), Mail::Message::Field::Full->new('to' => $to_address), Mail::Message::Field::Full->new('subject' => $subject, charset => $charset, encoding => 'B'), ); my $text = "Тело письма на русском языке."; $text = decode_utf8($text); $text = encode_utf8($text); my $body = Mail::Message::Body->new( data => $text, mime_type => 'text/plain', charset => $charset, ); $body = $body->concatenate($body, "\n-- \n", 'signature'); my $data = "текст приложения\n"; $data = decode_utf8($data); $data = encode_utf8($data); my $attach = Mail::Message::Body->new( data => $data, disposition => 'inline', mime_type => 'text/plain', charset => $charset, ); $body = $body->attach($attach); my $message = Mail::Message->build($body, head => $head); my $sender = Mail::Transport::SMTP->new( hostname => 'smtp.server.ru', timeout => 30, ); $sender->send($message); exit 0;
Subject: Re: [rt.cpan.org #52754] Cannot decode string with wide characters at Mail::Message::Body::Encode
Date: Tue, 15 Dec 2009 16:03:42 +0300
To: bug-Mail-Box [...] rt.cpan.org
From: Dmitry Bigunyak <icestar [...] inbox.ru>
Show quoted text
> The problem is again the same issue. The body contains message data, > but not always in an encoding which the Mail::Message accepts. Now you > create a body from perl data... so it is in "raw" form. > > charset => 'utf-8' means that the data content of the body is utf-8 > represented as bytes. The problem is that in Perl a string can either > contain latin1 characters or binary bytes. A message body must contain > bytes when the message is sent. A stand-alone body has more freedom: > either "bytes in a charset" or real (latin1 or utf8) PERL string > > So, when you change > charset => 'utf-8' > into charset => 'PERL' it works, > because your data in in Perl internal representation. > > But you can also change > $text = Encode::decode_utf8($text); > into $text = Encode::encode('utf-8', $text); > Now, $text is in bytes. > > Encodings are horror. >
Agree with you, this is a real horror! For me this is the most difficult part for understanding in your project and I think it should be better described in documentation. But now I have another magic problem. I attach the example script to describe the problem. I understand your solutions and they work fine, but in my project code I use Config::General module to get parameter charset and open config file with -UTF8 flag on. This is emulated with $charset = decode_utf8($charset); code in the example script. In this situation something goes wrong and I get bad text in arrived message. If I comment $charset = decode_utf8($charset); string everything is repaired. I can't explain what is happening... To run example script change from, to addresses and smtp server parameters. -- Dima: Nosce te ipsum e-mail: icestar@inbox.ru
#!/usr/bin/perl use strict; use warnings; use Mail::Message; use Mail::Message::Field::Full; use Mail::Transport::SMTP; use Data::Dumper; use Encode; my $subject = 'Это длинный заголовок тестового письма!'; my $charset = 'utf-8'; $charset = decode_utf8($charset); $subject = decode_utf8($subject); my $from_name = 'Дима'; $from_name = Encode::decode_utf8($from_name); my $from_address = Mail::Message::Field::Address->new( address => 'dima@mailbox.ru', phrase => $from_name, charset => $charset, encoding => 'B', ); my $to_name = 'Your address'; $to_name = Encode::decode_utf8($to_name); my $to_address = Mail::Message::Field::Address->new( address => 'test@mailbox.ru', phrase => $to_name, charset => $charset, encoding => 'B', ); my $head = Mail::Message::Head->build( Mail::Message::Field::Full->new('from' => $from_address), Mail::Message::Field::Full->new('to' => $to_address), Mail::Message::Field::Full->new('subject' => $subject, charset => $charset, encoding => 'B'), ); my $text = "Тело письма на русском языке."; $text = decode_utf8($text); $text = encode_utf8($text); my $body = Mail::Message::Body->new( data => $text, mime_type => 'text/plain', charset => $charset, ); $body = $body->concatenate($body, "\n-- \n", 'signature'); my $data = "текст приложения\n"; $data = decode_utf8($data); $data = encode_utf8($data); my $attach = Mail::Message::Body->new( data => $data, disposition => 'inline', mime_type => 'text/plain', charset => $charset, ); $body = $body->attach($attach); my $message = Mail::Message->build($body, head => $head); my $sender = Mail::Transport::SMTP->new( hostname => 'smtp.server.ru', timeout => 30, ); $sender->send($message); exit 0;
Subject: Re: [rt.cpan.org #52754] Cannot decode string with wide characters at Mail::Message::Body::Encode
Date: Wed, 16 Dec 2009 10:21:32 +0100
To: Dmitry Bigunyak via RT <bug-Mail-Box [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Dmitry Bigunyak via RT (bug-Mail-Box@rt.cpan.org) [091215 13:11]: Show quoted text
> Queue: Mail-Box > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=52754 > > > Agree with you, this is a real horror! For me this is the most difficult > part for understanding in your project and I think it should be better > described in documentation.
I'll try something for the next release. Show quoted text
> But now I have another magic problem. I attach the example script to > describe the problem.
No, your code is still broken (in the Perl sense) If your source contains utf-8 characters, then your file should start with 'use utf8;' Show quoted text
> I understand your solutions and they work fine, but in my project code I > use Config::General module to get parameter charset and open config file > with -UTF8 flag on.
The module is wrong. It should not translate -UTF8 in open( $fh, "<:utf8", $file) but use open( $fh, "<:encoding(utf-8)", $file) Better to say in your code: use encoding 'utf-8'; and not specify the -UTF8 option for Config::General. All my code is unicode aware, but I hardly ever need a decode_utf8() See man perlunicode(1) Show quoted text
> This is emulated with $charset = decode_utf8($charset); > code in the example script. In this situation > something goes wrong and I get bad text in arrived message. If I comment > $charset = decode_utf8($charset); string everything is repaired. I can't > explain what is happening...
Not in my environment. What's your $LANG setting? Mine is en_US.UTF-8 There is a difference between Perl's "utf8", and unicode "utf-8". -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
This is not a problem of MailBox, but Perl's knowledge about its environment.