Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Test-Simple CPAN distribution.

Report information
The Basics
Id: 122569
Status: new
Priority: 0/
Queue: Test-Simple

People
Owner: Nobody in particular
Requestors: zefram [...] fysh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: unprintable diagnostics
Date: Sun, 23 Jul 2017 10:39:46 +0100
To: bug-Test-Simple [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
There's a group of related problems around the printability of Test::More's diagnostics, which have a common cause and a common solution. The attached patch implements the solution. The most obvious of these problems is that, where a data string relevant to a diagnostic contains a control character, that control character is copied straight into the diagnostic. For example, given the test case $ perl -MTest::More -e 'is "", "\a"; done_testing' , one gets output that on a terminal looks like not ok 1 # Failed test at -e line 1. # got: '' # expected: '' 1..1 # Looks like you failed 1 test of 1. and the terminal beeps. The visible diagnostic gives the impression that the two strings being compared are identical, making the failure nonsensical. Unfavourable test runs can easily output a great variety of control characters: consider comparisons of byte strings resulting from cryptographic algorithms. Not only beeps but also messed-up terminal settings result. Related trouble occurs with characters that, while notionally printable, aren't as portable. Consider the test case $ perl -MTest::More -e 'is "\x{e9}", "\x{e2}\x{98}\x{83}"; is "\x{e9}", "\x{2603}"; done_testing' . Firstly, in any case the output contains some C1 control characters, but let's ignore that. By default, i.e., if there's no environment setting to tell Perl to encode its output in UTF-8, then the output shows differing `got' strings and identical `expected' strings, which is the opposite of the truth. The reason for this is revealed by a "Wide character in print" warning: the second diagnostic contains a literal snowman character, which of course can't be sent to a byte stream, and in a terrible decision from 5.6 the core handles this by implicitly encoding just that diagnostic in UTF-8. The practical upshot is that different diagnostics in a single test script run are encoded inconsistently. This warning always means there's a bug: it is a bug that Test::More attempts to output an arbitrary Unicode character to a stream that it doesn't know can accept non-bytes. Even in a fully Unicode-capable environment, in the testing context there are problems with displaying Unicode characters literally. Supposing that the terminal expects UTF-8 and can fully render Unicode, $ perl -MTest::More -e 'is "A", "\x{391}"; is "\x{e9}", "e\x{301}"; done_testing' (optionally with environment settings for output encoding) produces two diagnostics that show remarkably similar `got' and `expected' strings. In the first case, Latin A versus Greek alpha, these are different graphemes, but will (intentionally) have identical appearance in some fonts. In the second case, precomposed e-acute versus combining sequence, both character sequences represent the same grapheme, and should therefore appear identical in any correct rendering. In both cases, rendering these printable Unicode character sequences impedes the user in comprehending the differences between them, and hence damages the usefulness of the diagnostic for debugging purposes. Furthermore, even where Unicode characters don't cause these problems, they impede communication of the diagnostics to anyone else, by email or other means. Sometimes they would get through correctly, but it's common for encoding problems to arise along the way, and so even when they actually do get through correctly the receiver can't rely on them having done so. The only characters that can be safely used in diagnostics are the printable ASCII characters. The solution to all the above problems is that all other characters in data strings should be described in diagnostics by non-literal means. The attached patch borrows some logic from Carp's stack trace code to represent data strings in Perl syntax, using only printable ASCII. -zefram

Message body is not shown because sender requested not to inline it.