Skip Menu |

This queue is for tickets about the AxKit CPAN distribution.

Report information
The Basics
Id: 2980
Status: new
Priority: 0/
Queue: AxKit

People
Owner: Nobody in particular
Requestors: bronto [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 1.6.1
Fixed in: (no value)



Subject: incoherent encoding of entities
Distribution: AxKit 1.6.1, with XSP.pm and LibXMLSupport from CVS: $Id: XSP.pm,v 1.45 2003/07/16 15:02:11 matts Exp $ $Id: LibXMLSupport.pm,v 1.2 2003/03/18 15:19:46 matts Exp $ Perl Version: This is perl, v5.8.0 built for i686-linux Operating System vendor and version: Debian GNU/Linux Woody Linux nsmweb 2.4.18-bf2.4 #1 Son Apr 14 09:53:28 CEST 2002 i686 unknown Environment: * XML::LibXML v1.51 * XML::LibXSLT v1.52 * AxKit::XSP::Util v1.6 * libxml2 v2.5.8 * libxslt v1.0.31 I have a document that complies with a DTD I wrote myself. The document contains some entities, namely: &aelig; and &agrave;, which are defined on the DTD as: <!ENTITY aelig "&#230;" ><!-- small ae diphthong (ligature) --> <!ENTITY agrave "&#224;" ><!-- small a, grave accent --> For compatibility with quite old browsers and OSs, I had on the configuration files these directives: AxTranslateOutput On AxOutputCharset iso-8859-1 This worked ok with AxKit 1.5. On 1.6.1: * the browser correctly detects a ISO-8859-1 encoding * &aelig; shows up as æ * &agrave; shows up correctly If I force the browser to UTF-8, &aelig; shows up correctly and &agrave; is garbled. If I comment out the two directives above: * the browser detects a UTF-8 encoding * again, &aelig; shows up as æ and &agrave; shows up correctly Forcing the browser to ISO-8859-1 further garbles the output. No problems are reported to the error log (with AxDebugLevel set to 1). Using xmllint to check if the errors depend on the libxml2 library doesn't detect anything strange: xmllint --valid --loaddtd filename shows no problem. xmllint --debugent --encode iso-8859-1 --loaddtd filename then i get an output for all entities defined in the DTD. The two indicted entities are parsed as follows: aelig : INTERNAL GENERAL, orig "&#230;" content "<C3><A6>" agrave : INTERNAL GENERAL, orig "&#224;" content "<C3><A0>" The same holds dropping the "--encode iso-8859-1" option.
From: kjetilk [...] cpan.org
[BRONTO - Fri Jul 18 09:12:09 2003]: Show quoted text
> Distribution: AxKit 1.6.1, with XSP.pm and LibXMLSupport from CVS: > $Id: XSP.pm,v 1.45 2003/07/16 15:02:11 matts Exp $
Show quoted text
> * the browser correctly detects a ISO-8859-1 encoding > * &aelig; shows up as æ > * &agrave; shows up correctly
I noticed that a CVS checkin with version 1.51 has a few UTF8 fixes. Perhaps you could apply that and see if it works? Check it out [sic!] at http://cvs.apache.org/viewcvs.cgi/xml-axkit/lib/Apache/AxKit/Language/XSP.pm#rev1.51 Cheers, Kjetil