Skip Menu |

This queue is for tickets about the File-MMagic CPAN distribution.

Report information
The Basics
Id: 5399
Status: new
Priority: 0/
Queue: File-MMagic

People
Owner: Nobody in particular
Requestors: sf [...] flacks.net
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.21
Fixed in: (no value)



Subject: Specials don't work well with multiple/capruring regexes
Hi, I think there are a couple of problems with the way that Specials are currently implimented. 1. multiple patterns are joined into a single alternation pattern 2. the patterns are joined in order of length (not the order set by the user) 3. the alternation pattern groups the patterns in capturing parentheses This means that if I have this pattern: '<(\w+).*?<\/\1>' I must write is as '<(\w+).*?<\/\2>' since \1 refers to the captured group in [3] above. This also means I cannot write the pattern with qr//. Furthermore, if I have two patterns that require capturing parentheses, I must combine them into a single pattern since they might not end up in the same order. Consider these two patterns: (["'])[^\1]+\1 <(\w+).*?<\/\1> I need to know too much about the internals of File::MMagic to make this work: $mmagic->addSpecials('text/foo', '(["'])[^\3]+\3', '<(\w+).*?<\/\2>'); Finally, since the patterns are joined into a single regex, and sorted by length, it will stop at the first match - which might be the longest match, but it might not be the best match. I think a better approach is to let the users specify the most significant pattern by retaining the order of the patterns. Attached is a patch to v1.21 that doesn't join the patterns together and tests them in the order specified by the user.
--- MMagic.pm.orig Fri Feb 20 12:52:03 2004 +++ MMagic.pm Fri Feb 20 12:53:26 2004 @@ -681,11 +681,12 @@ my ($token, %val); foreach my $type (keys %{$self->{SPECIALS}}) { - my $token = '(' . - (join '|', sort {length($a) <=> length($b)} @{$self->{SPECIALS}->{$type}}) - . ')'; - my $tdata = $data; - if ($tdata =~ /$token/mg) { - $val{$type} = pos($tdata); - } + my $type_pos; + my $tdata = $data; + foreach my $token (@{$self->{SPECIALS}->{$type}}) { + next unless $tdata =~ /$token/mg; + $type_pos = pos($tdata) + unless defined $type_pos && $type_pos < pos($tdata); + } + $val{$type} = $type_pos if $type_pos; }