Subject: | check_binary returns true for text files contaning square brackets |
The call to the C<tr> operator in the C<check_binary> function uses what looks like a recexp character class, but C<tr> takes a just list of characters. So the C<[]> square brackets will be counted as "binary" characters, and a file contaning enough of them will be incorrectly marked as "binary". It happened to me an this very short template (TT2):
[% code %]
[% name %]
--- /Users/dakkar/Library/Perl/File/MMagic.pm 2004-03-15 09:23:04.000000000 +0100
+++ MMagic.pm 2005-07-04 13:10:00.000000000 +0200
@@ -727,11 +727,11 @@
my ($data) = @_;
my $len = length($data);
if ($allowEightbit) {
- my $count = ($data =~ tr/[\x00-\x08\x0b-\x0c\x0e-\x1a\x1c-\x1f]//); # exclude TAB, ESC, nl, cr
+ my $count = ($data =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a\x1c-\x1f//); # exclude TAB, ESC, nl, cr
return 1 if ($len <= 0); # no contents
return 1 if (($count/$len) > 0.1); # binary
} else {
- my $count = ($data =~ tr/[\x00-\x08\x0b-\x0c\x0e-\x1a\x1c-\x1f\x80-\xff]//); # exclude TAB, ESC, nl, cr
+ my $count = ($data =~ tr/\x00-\x08\x0b-\x0c\x0e-\x1a\x1c-\x1f\x80-\xff//); # exclude TAB, ESC, nl, cr
return 1 if ($len <= 0); # no contents
return 1 if (($count/$len) > 0.3); # binary
}