Subject: | No warning for invalid bytes in data assumed to be UTF8 |
According to the Perldoc spec, files that lack an =encoding declaration
are assumed to be in UTF8:
"By default, Perldoc assumes that documents are Unicode, encoded in one
of the three common schemes (UTF-8, UTF-16, or UTF-32). The particular
scheme a document uses is autodiscovered by examination of the first few
bytes of the file (where possible). If the autodiscovery fails, UTF-8 is
assumed,"
However, if chars that do not exist in UTF8 are present in the source
file, they are included in the output XHTML, without any warning. The
XHTML then does not validate, and has invalid data in it.
The spec itself adds: "and parsers may treat any non-UTF-8 bytes later
in the document as fatal errors." This feature is therefore not
required, but highly desirable.
Minimal test document attached.
Subject: | pod6_test.pod6 |
Message body not shown because it is not plain text.