Skip Menu |

This queue is for tickets about the TAP-Harness-JUnit CPAN distribution.

Report information
The Basics
Id: 49307
Status: resolved
Priority: 0/
Queue: TAP-Harness-JUnit

People
Owner: JLAVALLEE [...] cpan.org
Requestors: SCR [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.26
Fixed in: 0.33



Subject: system-out content is not escaped
As a snippet, consider this, which fails to parse in firefox. I'm not sure what the correct escaping should be for xml (pcdata/cdata?), but that should be used here so this parses regardless of the system-out content. <system-out> # test.pl:1980: input: &quot;http://yahoo.com?x=&quot; ok 711 - test_url_out </system-out> When viewed in firefox, we get the message: XML Parsing Error: not well-formed Location: http://build7.sbs.corp.sk1.yahoo.com:9999/yhudson/view/y_pack_y/job/y_pack_y-trunk-commit/ws/yahoo/packages/yahoo/perl/yiv/junit-reports/test.pl.xml Line Number 1992, Column 49:# test.pl:1980: input: &quot;http://yahoo.com?x=&quot; ------------------------------------------------^
Actually, there is no escaping for most of the low-ASCII control codes, they're just not allowed in xml at all, even though they're valid utf8. As per the spec http://www.w3.org/TR/REC-xml/#NT-Char, only these characters are allowed: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Attached is a patch (with unit tests) that substitutes, for example, "\f" (form feed) with "<0c>" in the output.
Subject: TAP-Harness-JUnit-bad-xml-chars.patch
diff -u -Naur TAP-Harness-JUnit-0.32.orig/lib/TAP/Harness/JUnit.pm TAP-Harness-JUnit-0.32/lib/TAP/Harness/JUnit.pm --- TAP-Harness-JUnit-0.32.orig/lib/TAP/Harness/JUnit.pm 2009-07-13 09:09:31.000000000 -0700 +++ TAP-Harness-JUnit-0.32/lib/TAP/Harness/JUnit.pm 2010-01-14 16:42:14.000000000 -0800 @@ -147,7 +147,7 @@ next NAME if $newname eq $testcase->{name}; } - return $newname; + return xmlsafe($newname); } } @@ -215,12 +215,12 @@ #$comment .= $result->comment."\n"; # ->comment has leading whitespace stripped - $result->raw =~ /^# (.*)/ and $comment .= $1."\n"; + $result->raw =~ /^# (.*)/ and $comment .= xmlsafe($1)."\n"; } # Errors if ($result->type eq 'unknown') { - $comment .= $result->raw."\n"; + $comment .= xmlsafe($result->raw)."\n"; } # Test case @@ -240,7 +240,7 @@ if ($result->ok eq 'not ok') { $test->{failure} = [{ type => blessed ($result), - message => $result->raw, + message => xmlsafe($result->raw), content => $comment, }]; $xml->{errors}++; @@ -251,7 +251,7 @@ } # Log - $xml->{'system-out'}->[0] .= $result->raw."\n"; + $xml->{'system-out'}->[0] .= xmlsafe($result->raw)."\n"; } # Detect no plan @@ -302,7 +302,7 @@ failure => { type => 'Died', message => $badretval->comment, - content => $badretval->raw, + content => xmlsafe($badretval->raw), }, }; $xml->{errors}++; @@ -360,6 +360,21 @@ return $aggregator; } +# Because not all utf8 characters are allowed in xml, only these +# Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] +# http://www.w3.org/TR/REC-xml/#NT-Char +sub xmlsafe { + my $s = shift; + + return '' unless defined $s && length($s) > 0; + + $s =~ s/([\x01|\x02|\x03|\x04|\x05|\x06|\x07|\x08|\x0B|\x0C|\x0E|\x0F|\x11|\x12|\x13|\x14|\x15|\x16|\x17|\x18|\x19|\x1A|\x1B|\x1C|\x1D|\x1E|\x1F])/ sprintf("<%0.2x>", ord($1)) /gex; + + + return $s; +} + + =head1 SEE ALSO JUnit XML schema was obtained from L<http://jra1mw.cvs.cern.ch:8180/cgi-bin/jra1mw.cgi/org.glite.testing.unit/config/JUnitXSchema.xsd?view=markup>. diff -u -Naur TAP-Harness-JUnit-0.32.orig/t/tests/nonutf8log.txt TAP-Harness-JUnit-0.32/t/tests/nonutf8log.txt --- TAP-Harness-JUnit-0.32.orig/t/tests/nonutf8log.txt 2009-07-13 09:09:31.000000000 -0700 +++ TAP-Harness-JUnit-0.32/t/tests/nonutf8log.txt 2010-01-15 09:09:03.000000000 -0800 @@ -1,3 +1,7 @@ -1..1 +1..3 ok 1 - First # ² +ok 2 - bad char == == +ok 3 - japanese s/b ok 記事だにゃ +# bad char == == +# japanese s/b ok 記事だにゃ diff -u -Naur TAP-Harness-JUnit-0.32.orig/t/tests/nonutf8log.xml TAP-Harness-JUnit-0.32/t/tests/nonutf8log.xml --- TAP-Harness-JUnit-0.32.orig/t/tests/nonutf8log.xml 2009-07-13 09:09:31.000000000 -0700 +++ TAP-Harness-JUnit-0.32/t/tests/nonutf8log.xml 2010-01-15 09:09:20.000000000 -0800 @@ -1,10 +1,16 @@ <?xml version='1.0' encoding='utf-8'?> <testsuites> - <testsuite name="Special characters in log" errors="0" failures="0" tests="1" time="0"> - <system-out>1..1 + <testsuite name="Special characters in log" errors="0" failures="0" tests="3" time="0"> + <system-out>1..3 ok 1 - First # � +ok 2 - bad char ==&lt;0c&gt;== +ok 3 - japanese s/b ok 記事だにゃ +# bad char ==&lt;0c&gt;== +# japanese s/b ok 記事だにゃ </system-out> <testcase name="First" classname="Special characters in log" time="0" /> + <testcase name="bad char ==&lt;0c&gt;==" classname="Special characters in log" time="0" /> + <testcase name="japanese s/b ok 記事だにゃ" classname="Special characters in log" time="0" /> </testsuite> </testsuites>
patch applied in 0.33 release