Skip Menu |

This queue is for tickets about the XML-LibXML CPAN distribution.

Report information
The Basics
Id: 50829
Status: resolved
Priority: 0/
Queue: XML-LibXML

People
Owner: Nobody in particular
Requestors: MARKOV [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Unimportant
Broken in: 1.70
Fixed in: (no value)



Subject: Malformed UTF-8 in tests
Various tests of t/02parse.t produce as warning: Malformed UTF-8 character (unexpected end of string) in substitution (s///) at lib/XML/LibXML/Error.pm line 217 test 40 twice, test 41 once, test 88 twice, test 89 once, test 134 twice, test 135 once, test 181 twice, test 182 once. Running Perl 5.10.1 on openSuSE 11.0
I can confirm Compiled against libxml2 version: 20706 Argument "20706-GITv2.7.6" isn't numeric in numeric ne (!=) at t/01basic.t line 18. FAILED test 3 Failed 1/3 tests, 66.67% okay t/02parse.....................Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /root/.cpanplus/5.10.0/build/XML-LibXML-1.70/blib/lib/XML/LibXML/Error.pm line 217. -- Evan Carroll perl@evancarroll.com http://www.evancarroll.com
I'm seeing this while smoking my perl 5.12 build. The patch is easy, as long as it's not pointing to something more nefarious under the hood:
Subject: patch.txt
diff --git a/lib/XML/LibXML/Error.pm b/lib/XML/LibXML/Error.pm index b60d6cf..ee46b11 100644 --- a/lib/XML/LibXML/Error.pm +++ b/lib/XML/LibXML/Error.pm @@ -213,7 +213,7 @@ sub as_string { } elsif (defined $self->{context}) { my $context = $self->{context}; $msg.=$context."\n"; - $context = substr($context,0,$self->{column}); + $context = substr($context,0,$self->{column}) || ''; $context=~s/[^\t]/ /g; $msg.=$context."^\n"; }
My bad. This patch actually just hides the problem. A Quick run on tests for 02parse.t, which didn't change from 1.69 seems to indicate that this is a regression issue. It appears that UTF8 data is being corrupted
Verbose test output: Show quoted text
>prove -bvm t/02parse.t
t/02parse.t .. 1..496 # Running under perl version 5.012000 for linux # Current time local: Tue Apr 6 15:45:42 2010 # Current time GMT: Tue Apr 6 20:45:42 2010 # Using Test.pm version 1.25_02 # 1 NON VALIDATING PARSER # 1.1 WELL FORMED STRING PARSING # 1.1.1 DEFAULT VALUES ok 1 ok 2 ... ok 39 ok 40 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 41 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 42 ... ok 86 ok 87 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 88 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 89 ... ok 134 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 135 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 136 ok 137 ... ok 180 ok 181 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 182 Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 183 ok 184 ... ok 495 ok 496 ok All tests successful. Files=1, Tests=496, 0 wallclock secs ( 0.08 usr 0.02 sys + 0.13 cusr 0.02 csys = 0.25 CPU) Result: PASS
applying patches from 56334, I've been able to more easily determine the source of the warnings. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. Malformed UTF-8 character (unexpected end of string) in substitution (s///) at /home/toddr/projects/cpperl/cpan/SOURCES/modules/XML-LibXML- 1.70/blib/lib/XML/LibXML/Error.pm line 217. ok 41 - Error thrown passing '<foob<E4>r/>'
This command causes this warning. I assume because it's a bad encoding? $parser->parse_string("<foobär/>"); Perhaps parse_string should check encoding before doing anything with the data it's passed?
Subject: Re: [rt.cpan.org #50829] Malformed UTF-8 in tests
Date: Wed, 7 Apr 2010 09:49:03 +0200
To: Todd Rinaldo via RT <bug-XML-LibXML [...] rt.cpan.org>
From: Mark Overmeer <solutions [...] overmeer.net>
* Todd Rinaldo via RT (bug-XML-LibXML@rt.cpan.org) [100406 23:00]: Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=50829 > > This command causes this warning. I assume because it's a bad encoding? > > $parser->parse_string("<foobär/>"); > > Perhaps parse_string should check encoding before doing anything > with the data it's passed?
Well, there are various interpretations of your example: is your file utf-8 or latin1? When it is latin1, I get panic: sv_pos_u2b_cache cache 7 real 5 for <foob�r/> at /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi/XML/LibXML/Error.pm line 216. Except when I add "use encoding 'latin1';" to the script. Then it stops crashing. Different to Perl, XML::LibXML expects strings to be either in utf8 or in "the same encoding as the input/output document has specified". The problem you encounter is comparible to that of the tests: the Perl scripts should not trust on Perl's smart behavior trying to autodetect the character-encoding of the files based on OS information. Probably, you should add "use utf8;" to the top of your file. -- Regards, MarkOv ------------------------------------------------------------------------ Mark Overmeer MSc MARKOV Solutions Mark@Overmeer.net solutions@overmeer.net http://Mark.Overmeer.net http://solutions.overmeer.net
I'm resolving this bug as old, because I am unable to reproduce it here (Mandriva Linux Cooker, x86-64 or Mageia Linux Cauldron, x86-32, both with the latest libxml2 and a UTF-8 locale.). If you can reproduce it please reply to it with a reproduction recipe.