Skip Menu |

This queue is for tickets about the CAM-PDF CPAN distribution.

Report information
The Basics
Id: 58144
Status: new
Priority: 0/
Queue: CAM-PDF

People
Owner: Nobody in particular
Requestors: m.g.birdsall [...] mbirdsall.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.52
Fixed in: (no value)



Subject: Interactive forms - Fully qualified field names, text font size, and interactive form dictionary
I've been doing some work to fill in form fields in a PDF, and ran into a few issues or features that CAM::PDF does not seem to handle correctly. I've created some fixes for these problems, and wanted to report the problems and the fixes that I've found. The form I am working with is a PDF for reporting Minimum Data Set 3.0 (MDS 3.0) data to the Centers for Medicare and Medicaid Services (CMS), part of the United States Department of Health and Human Services. The issues I have found are: 1. The getFormField method is not properly handling fully qualified field names (name1.name2.name3...). This issue is line 2813 if ($fieldname =~ s/ \A(.*)[.]([.]+)\z /$2/xms) which I believe is intended to be if ($fieldname =~ s/ \A(.*)[.]([^.]+)\z /$2/xms) The current line looks for a pair of "." in the field name, the correct one collects up the final (rightmost) partial field name remaining. 2. The fillFormFields method is not properly handling Font names with dashes in them, or text font sizes defined with a real (not an integer). This issue is in line 4415: if ($da =~ m{ \s*/(\w+)\s+(\d+)\s+Tf.*? \z }xms) which I modified to if ($da =~ m{ \s*/([\w-]+)\s+([.\d]+)\s+Tf.*? \z }xms) which adds dash to the acceptable characters in a font name, and the period(decimal-point) to the text font size. Dash is likely not the only character that needs to be added to word characters for the fontnames, but it is pretty common with fonts being commonly named things like "Courier-Bold" or "Times-Italic" and other similar names. The correct solution would allow any characters other than white space or delimiters, That would be sufficient to make it a PDF Name, and should deal with all valid PDFs. A further test could check that it is a valid PostScript name, so it can't have the appearance of a number following the slash. The code for that should be if ($da =~ m{ \s*/ ([$!#&'*+,.:;=?@\\^`|~\w-]+) \s+ ([.\d]+) \s+Tf.*? \z }xms) but I have not tested that, apart from the syntax, and have not included it in the patch. 3. The fillFormsField is searching for the font in DR entry in the Field Dictionary in order to get the font metrics. According to PDF Reference version 1.6 (implementation note 117) "In PDF 1.2, an additional entry in the field dictionary, DR, was defined but was never implemented. Beginning with PDF 1.5, this entry is obsolete and should be ignored." Instead, as defined in Section 8.6.2 under Variable Text, the information should be taken from the interactive form dictionary's DR entry. So, I changed the code at 4421 to look up the interactive field dictionary for the DR entry, rather than in the field dictionary. 4. Similarly, when fillFormsField at 4576 tries to set the font object in the Resources directory, it should get the font object from the interactive forms dictionary's DR entry Perl Version: This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi
Subject: campdf.diff
--- CAM-PDF-1.52-rev-bug/lib/CAM/PDF.pm 2008-10-03 03:35:52.000000000 -0400 +++ CAM-PDF-1.52/lib/CAM/PDF.pm 2010-06-04 15:52:29.000000000 -0400 @@ -2810,7 +2810,7 @@ sub getFormField if ($fieldname =~ m/ [.] /xms) { my $parentname; - if ($fieldname =~ s/ \A(.*)[.]([.]+)\z /$2/xms) + if ($fieldname =~ s/ \A(.*)[.]([^.]+)\z /$2/xms) { $parentname = $1; } @@ -4412,15 +4412,17 @@ sub fillFormFields ## no critic(Subrout # Try to pull out the font size, if any. If more than # one, pick the last one. Font commands look like: # "/<fontname> <size> Tf" - if ($da =~ m{ \s*/(\w+)\s+(\d+)\s+Tf.*? \z }xms) + if ($da =~ m{ \s*/([\w-]+)\s+([.\d]+)\s+Tf.*? \z }xms) { $fontname = $1; $fontsize = $2; if ($fontname) { - if ($propdict->{DR}) + my $root = $self->getRootDict()->{AcroForm}; + my $ifdict = $self->getValue($root); + if (exists $ifdict->{DR}) { - my $dr = $self->getValue($propdict->{DR}); + my $dr = $self->getValue($ifdict->{DR}); $fontmetrics = $self->getFontMetrics($dr, $fontname); } #print STDERR "Didn't get font\n" if (!$fontmetrics); @@ -4573,17 +4575,26 @@ sub fillFormFields ## no critic(Subrout } my $fdict = $self->getValue($rdict->{Font}); - # Search out font resources. This is a total kluge. - # TODO: the right way to do this is to look for the DR - # attribute in the form element or it's ancestors. + # Search out font resources. + # As of PDF Reference fifth edition, the right way to do this + # is to get the DR attribute from the interactive form dictionary + # from the AcroForm entry in the document catalog for my $font (@rsrcs) { - my $fobj = $self->dereference("/$font", 'All'); - if (!$fobj) + my $root = $self->getRootDict()->{AcroForm}; + my $ifdict = $self->getValue($root); + if (!exists $ifdict->{DR}) + { + die "Could not find resource /$font while preparing form field $key\n"; + } + my $dr = $self->getValue($ifdict->{DR}); + my $fobjnum = $dr->{Font}->{value}->{$font}->{value}; + + if (!$fobjnum) { die "Could not find resource /$font while preparing form field $key\n"; } - $fdict->{$font} = CAM::PDF::Node->new('reference', $fobj->{objnum}, $formonum, $formgnum); + $fdict->{$font} = CAM::PDF::Node->new('reference', $fobjnum, $formonum, $formgnum); } } }