Subject: | incoherent encoding of entities |
Distribution: AxKit 1.6.1, with XSP.pm and LibXMLSupport from CVS:
$Id: XSP.pm,v 1.45 2003/07/16 15:02:11 matts Exp $
$Id: LibXMLSupport.pm,v 1.2 2003/03/18 15:19:46 matts Exp $
Perl Version: This is perl, v5.8.0 built for i686-linux
Operating System vendor and version:
Debian GNU/Linux Woody
Linux nsmweb 2.4.18-bf2.4 #1 Son Apr 14 09:53:28 CEST 2002 i686 unknown
Environment:
* XML::LibXML v1.51
* XML::LibXSLT v1.52
* AxKit::XSP::Util v1.6
* libxml2 v2.5.8
* libxslt v1.0.31
I have a document that complies with a DTD I wrote myself. The document contains some entities, namely: æ and à, which are defined on the DTD as:
<!ENTITY aelig "æ" ><!-- small ae diphthong (ligature) -->
<!ENTITY agrave "à" ><!-- small a, grave accent -->
For compatibility with quite old browsers and OSs, I had on the configuration files these directives:
AxTranslateOutput On
AxOutputCharset iso-8859-1
This worked ok with AxKit 1.5. On 1.6.1:
* the browser correctly detects a ISO-8859-1 encoding
* æ shows up as æ
* à shows up correctly
If I force the browser to UTF-8, æ shows up correctly and à is garbled.
If I comment out the two directives above:
* the browser detects a UTF-8 encoding
* again, æ shows up as æ and à shows up correctly
Forcing the browser to ISO-8859-1 further garbles the output.
No problems are reported to the error log (with AxDebugLevel set to 1).
Using xmllint to check if the errors depend on the libxml2 library doesn't detect anything strange:
xmllint --valid --loaddtd filename shows no problem.
xmllint --debugent --encode iso-8859-1 --loaddtd filename
then i get an output for all entities defined in the DTD. The two indicted entities are parsed as follows:
aelig : INTERNAL GENERAL,
orig "æ"
content "<C3><A6>"
agrave : INTERNAL GENERAL,
orig "à"
content "<C3><A0>"
The same holds dropping the "--encode iso-8859-1" option.