Skip Menu |

This queue is for tickets about the XML-Xerces CPAN distribution.

Report information
The Basics
Id: 7103
Status: resolved
Priority: 0/
Queue: XML-Xerces

People
Owner: jasons [...] cpan.org
Requestors: ekliao [...] yahoo.com
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 2.3.0-4
Fixed in: (no value)

Attachments
xml-xerces-problem-with-Chinese.zip



Subject: Chinese characters get lost in the XML::Xerces 'characters' callback subroutine
Distro: XML::Xerces 2.3.0-4 Perl: ActivePerl v5.8.4 binary build 810 OS: MS Windows XP Problem: When the source XML parsed by XML::Xerces contains a text node which contains a Chinese character, that Chinese character somehow turns into an empty string when it is passed to the characters call-back subroutine. The parsing does not generate errors. Attached code samples demonstrate this. The Chinese character in question is in test.xml, inside the text node of the first project_number element: utf8 char here:(...) (where ... is the Chinese chacter, U+6B63) Run the test like this: perl xerces-sax2-counter.pl test.xml This will produce an output file: xerces-sax2-counter.out.txt. Currently, the first line is: [] when it should be: [utf8 char here:(...)] because of this line in the code: print O "[$str]\n"; I have added Perl 5.8 features such as use utf8 and binmode(..., ":utf8") in the code, but the Unicode Chinese character still got lost. I don't know if there is something in the XML::Xerces documentation that mentions the correct way of capturing a CJK character. Thanks!
Download xml-xerces-problem-with-Chinese.zip
application/x-zip-compressed 2k

Message body not shown because it is not plain text.