Just an update
It appears from the documentation that my processor should replace the
existing one, but it doesn't seem to. Perhaps I'm doing something wrong.
My extract script:
#!/usr/local/bin/perl
my $filename = 'mydoc.pdf';
use File::Extract;
use File::Extract::Filter::Exec;
my $output;
my $e = File::Extract->new();
$e->magic->add_file_ext('doc' => 'application/msword');
$e->register_processor('Doc');
$e->register_processor('MyPDF');
my $r = $e->extract($filename);
print $r->text;
And my processor
package MyPDF;
use strict;
use base qw(File::Extract::Base);
use File::Extract::Result;
sub mime_type { 'application/pdf' }
sub extract
{
my $self = shift;
my $file = shift;
my $text;
{
local $/;
open (FH, "pdftotext $file - |");
$text = <FH>;
close FH;
}
return File::Extract::Result->new(
text => eval { $self->recode($text) } || $text,
filename => $file,
mime_type => $self->mime_type
);
}
1;
Regards
Dan