Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the File-MimeInfo CPAN distribution.

Report information
The Basics
Id: 20376
Status: resolved
Priority: 0/
Queue: File-MimeInfo

People
Owner: Nobody in particular
Requestors: mcummings [...] gentoo.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: (no value)
Fixed in: (no value)



Subject: Weird problem (and patch)
I was trying to understand why filer (http://blog.perldude.de/archives/category/programming/filer/) was having problems, and it had to do with how File::MimeInfo was parsing my home directory under utf8. I am very much not an expert on utf8 - the enclosed patch was based on a sample from URI::Escape, but it does appear to work (the problem was that while reading files in my home directory, the utf8 encoding was breaking and the result was the infamous "Malformed UTF-8 character (fatal) at /usr/lib64/perl5/vendor_perl/5.8.8/File/MimeInfo.pm line 120.". The patch included allows File::MimeInfo to be happy, and therefore filer is happy, and therefore I am happy :) Hope this makes sense, let me know if you need anything else, ~mcummings
Subject: mimeinfo.patch
--- /usr/lib64/perl5/vendor_perl/5.8.8/File/MimeInfo.pm 2006-07-09 10:57:47.000000000 -0400 +++ /home/mcummings/mimeinfo.pm 2006-07-09 10:59:12.000000000 -0400 @@ -116,8 +116,14 @@ sub default { { no warnings; # warnings can be thrown when input is neither ascii or utf8 - $line =~ s/\s//g; # \n and \t are also control chars - return 'text/plain' unless $line =~ /[\x00-\x1F\xF7]/; + if ($] < 5.008) { + $line =~ s/([^\0-\x7F])/do {my $o = ord($1); sprintf("%c%c", 0xc0 | ($o >> 6), 0x80 | ($o & 0x3f)) }/ge; + } + else + { + utf8::encode($line) + } + return 'text/plain' unless $line =~ /[\x00-\x1F\xF7]/; } print STDERR "> First 10 bytes of the file contain control chars\n" if $DEBUG; return 'application/octet-stream';
From: mcummings [...] gentoo.org
Per email request from Jaap, attached is a sample file that breaks when being scanned by mimetype (original file name was .rnd in my home dir, but renaming it still exhibts the Malformed UTF-8 character error). Same error exists with mimetype * and mimetype -M *. Just to clarify, I came across the bug while working with an application called filer - not my own code ;) The call in filer is: my $type = File::MimeInfo::mimetype("$filepath/$file"); where $filepath was defined as the path, $file as the current file they are looking at (iirc, this section was to go through and categorize all of the objects in a directory so that the gtk2 display that gets drawn afterwards has the appropriate pixmaps - but I'm not associated with the filer project, just another linux distro guy ;)
Download sample-breakage
application/octet-stream 1024b

Message body not shown because it is not plain text.

From: JONAS [...] cpan.org
Another simple fix is to include "use bytes;" at the top of the module.
From: PARDUS [...] cpan.org
Just uploaded version 0.14 to CPAN. The solution I implemented does check for utf8 and if the data is not utf8 it uses the bytes pragma. See the changelog for other updates. Jaap
Fixed in 0.14