Subject: | Non-ASCII characters mangled |
HTML::HTMLDoc does not know how to handle non-ASCII characters correctly.
If you run the following script, you will see that the text in the PDF
file gets mangled, and this warning is emitted: "Wide character in print
at HTML/HTMLDoc.pm line 1057."
use HTML::HTMLDoc;
my $html_character_string = "<html>
<head>
<meta http-equiv=\"Content-Type\" content=\"text/html;
charset=UTF-8\" />
</head>
<body>
I \x{2661} caf\x{e9}s.
</body>
</html>\n";
my $htmldoc = HTML::HTMLDoc->new();
$htmldoc->set_html_content($html_character_string);
my $pdf = $htmldoc->generate_pdf();
$pdf->to_file('foo.pdf');