Subject: | dump(*FH) method corrupts (just Hebrew?) utf8 wide characters |
Date: | Mon, 17 May 2010 19:15:23 +0300 |
To: | bug-HTML-Tree [...] rt.cpan.org |
From: | Meir Guttman <meir [...] guttman.co.il> |
Hi folks!
I am trying to use HTML::Elements->dump to dump a tree of a captured HTML
download content which contains Hebrew Unicode characters, utf8 encoded.
If I am viewing the wireshark-captured stream in a Unicode supporting
text-editor I see all Hebrew characters all right:
But when I am trying to use the HTML::Elements->dump(*FH) method as follows:
my $response = $browser->get($url_request);
my $out_file = "spider.html";
open (OUTFILE, ">:encoding(utf8)", $out_file) or die "Cannot open $out_file,
$!\n";
if ($response->is_success) {
my $tree = HTML::TreeBuilder -> new_from_content($response->content()) or
die "*** Could not process URL";
$tree->dump(*OUTFILE);
$tree -> delete;
}
Then, when I view the "spider.html" file in the very same Unicode supporting
text editor I see this:
. with all Hebrew characters garbled. Please note that the "300" string of
characters is shown correctly.
Does the dump method of the HTML::Elements module support Unicode in general
and Hebrew in particular?
Do I do something wrong?
Regards,
Meir Guttman
Ashdod, Israel
Message body is not shown because it is too large.
Message body is not shown because sender requested not to inline it.
Message body is not shown because sender requested not to inline it.