Subject: | toString() fails on ISO-8859-2 encoded html created via parse_html_file() |
Distribution: XML::LibXML-1.31
Perl: This is perl, version 5.005_03 built for sun4-solaris
Machine hardware: sun4u
OS version: 5.7
Processor type: sparc
Hardware: SUNW,Ultra-80
A. The following error occurs...
xmlDocDumpFormatMemoryEnc: Failed to identify encoding handler for
character set 'iso-8859-2'
when calling toString() on an object created via parse_html_file().
The html file is encoded as ISO-8859-2 (top of file below...)
BTW when calling toString() on the same html file created
instead via parse_file(), no error occurs.
(source that crashes follows...)
use XML::LibXML;
use XML::LibXSLT;
my $xml_parser = XML::LibXML->new();
my $doc = $xml_parser->parse_html_file("Polska_Prezenty.html");
my $stringOutput = $doc->toString;
(source that works follows...)
use XML::LibXML;
use XML::LibXSLT;
my $xml_parser = XML::LibXML->new();
my $doc = $xml_parser->parse_file("Polska_Prezenty.html");
my $stringOutput = $doc->toString;
[Polska_Prezenty.html]
[...first 30 lines]
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Polska</TITLE>
<META content="text/html; charset=iso-8859-2" http-equiv=Content-Type>
<META content="" name=coder><LINK href="../image/pg.css"
rel=stylesheet type=text/css>
<SCRIPT language=JavaScript1.2 src="../image/fw_menu.js"></SCRIPT>
<SCRIPT language=JavaScript1.2 src="../image/pg.js"></SCRIPT>
<SCRIPT language=JavaScript>
MM_reloadPage(true);
</SCRIPT>
<META content="MSHTML 5.00.3019.2500" name=GENERATOR></HEAD>
<BODY bgColor=#ffffff leftMargin=0 text=#000000 topMargin=0 marginheight="0"
marginwidth="0">
<TABLE border=0 cellPadding=0 cellSpacing=0 width=760>
<TBODY>
<TR>
<TD background=../image/leftsdummy.gif height="100%" vAlign=top
width=34><!-- ramka -->
<TABLE border=0 cellPadding=0 cellSpacing=0 width="100%">
<TBODY>
.
.
.
.
.