Subject: | File::MimeInfo text/plain heuristics broken |
The heuristics for distinguishing between text/plain and
application/octet-stream in MimeInfo.pm is broken. It works differently
when the file is passed in as a file name, than when the file is passed
in as a filehandle.
The test case:
- create a file which is not UTF-8 encoded but contains only printable
Latin-<n> characters (i.e. not in [\x00-\x1f\x7f] except \t, \r, and \n):
$ perl -e 'open my $fh, ">/tmp/xxx"; print $fh "\xb0\xb1\xb2\xf2"'
$ perl -MFileHandle -MFile::MimeInfo::Magic -e 'print
mimetype("/tmp/xxx"), "\n"'
text/plain
- so far it is good, however:
$ $ perl -MFileHandle -MFile::MimeInfo::Magic -e 'open my $fh, "<",
"/tmp/xxx"; print mimetype($fh), "\n"'
application/octet-stream
- this is incorrect (at least the results should be the same; but I
think text/plain is the correct choice).
It worked correctly in 0.12, and it is broken in 0.14. The reason is
that 0.14 tries to run "binmode FILE, ':utf8'" when the file is given as
a file name, but not when it is given as a filehandle (which is probably
a good approach, it should not mess with the I/O layer settings of a
someone else's filehandle). However, the result is incorrect detection
of plain text files when given as a filehandle.