Skip Menu |

This queue is for tickets about the File-Extract CPAN distribution.

Report information
The Basics
Id: 20866
Status: new
Priority: 0/
Queue: File-Extract

People
Owner: Nobody in particular
Requestors: dan.horne [...] redbone.co.nz
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Wishlist: specify alternative processor in preference to default
Date: Mon, 7 Aug 2006 07:35:15 +1200
To: <bug-File-Extract [...] rt.cpan.org>
From: "Dan Horne" <dan.horne [...] redbone.co.nz>
Hi I'm using F::E's PDF extract functionality, but some words a running together as one word. I wrote another processor to use Xpdf's pdftotext, but I need to somehow specify that it is my preferred processor. Thanks Dan
Subject: RE: [rt.cpan.org #20866] AutoReply: Wishlist: specify alternative processor in preference to default
Date: Mon, 7 Aug 2006 07:48:04 +1200
To: <bug-File-Extract [...] rt.cpan.org>
From: "Dan Horne" <dan.horne [...] redbone.co.nz>
Just an update It appears from the documentation that my processor should replace the existing one, but it doesn't seem to. Perhaps I'm doing something wrong. My extract script: #!/usr/local/bin/perl my $filename = 'mydoc.pdf'; use File::Extract; use File::Extract::Filter::Exec; my $output; my $e = File::Extract->new(); $e->magic->add_file_ext('doc' => 'application/msword'); $e->register_processor('Doc'); $e->register_processor('MyPDF'); my $r = $e->extract($filename); print $r->text; And my processor package MyPDF; use strict; use base qw(File::Extract::Base); use File::Extract::Result; sub mime_type { 'application/pdf' } sub extract { my $self = shift; my $file = shift; my $text; { local $/; open (FH, "pdftotext $file - |"); $text = <FH>; close FH; } return File::Extract::Result->new( text => eval { $self->recode($text) } || $text, filename => $file, mime_type => $self->mime_type ); } 1; Regards Dan