Skip Menu |

This queue is for tickets about the WWW-Dict-Leo-Org CPAN distribution.

Report information
The Basics
Id: 132524
Status: new
Priority: 0/
Queue: WWW-Dict-Leo-Org

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 2.02
Fixed in: (no value)



Subject: Double encoded utf8 possible
Under some circumstances it's possible that leo ends up displaying umlauts as double encoded utf8. The problem depends on whether the sax parser XML::SAX::PurePerl is used or not. There's a very old bug report about this issue (<https://rt.cpan.org/Ticket/Display.html?id=79816>, almost eight years old). It's a little bit tricky to trigger this problematic situation. Experiments show that XML::SAX uses the last parser specified in XML/SAX/ParserDetails.ini. Usually XML::Simple would install XML::SAX::Expat (after XML::SAX::PurePerl was already installed), so in this case this would be the last sax parser. But given the situation that the user has an old version of XML::SAX installed, and then does an upgrade to a newer version, then XML::SAX::PurePerl would again be the last in list. In the attached Dockerfile the problematic situation can be reproducted. After a successful docker install one has just to run docker build -t perl-test . && docker run -it perl-test and wait a little. If the installation of the old XML-SAX-1.00 is removed from the Dockerfile, then the sample leo call would show correctly encoded umlauts. What would be the best workaround? There's not much hope that the XML::SAX::PurePerl issue will be fixed. But it's possible to set a specific sax parser using $XML::SAX::ParserPackage. Currently it's guaranteed that XML::SAX::Expat exists (as it's a dependency of XML::Simple), so this could be a valid value. However, my preference would be to not use XML::Simple or XML::SAX at all, but instead XML::LibXML (e.g. with the XPath interface), which is much better maintained these days.
Subject: Dockerfile
Download Dockerfile
application/octet-stream 560b

Message body not shown because it is not plain text.