Subject: | Interactive forms - Fully qualified field names, text font size, and interactive form dictionary |
I've been doing some work to fill in form fields in a PDF, and ran into
a few issues or features that CAM::PDF does not seem to handle
correctly. I've created some fixes for these problems, and wanted to
report the problems and the fixes that I've found.
The form I am working with is a PDF for reporting Minimum Data Set 3.0
(MDS 3.0) data to the Centers for Medicare and Medicaid Services (CMS),
part of the United States Department of Health and Human Services.
The issues I have found are:
1. The getFormField method is not properly handling fully qualified
field names (name1.name2.name3...). This issue is line 2813
if ($fieldname =~ s/ \A(.*)[.]([.]+)\z /$2/xms)
which I believe is intended to be
if ($fieldname =~ s/ \A(.*)[.]([^.]+)\z /$2/xms)
The current line looks for a pair of "." in the field name, the correct
one collects up the final (rightmost) partial field name remaining.
2. The fillFormFields method is not properly handling Font names with
dashes in them, or text font sizes defined with a real (not an integer).
This issue is in line 4415:
if ($da =~ m{ \s*/(\w+)\s+(\d+)\s+Tf.*? \z }xms)
which I modified to
if ($da =~ m{ \s*/([\w-]+)\s+([.\d]+)\s+Tf.*? \z }xms)
which adds dash to the acceptable characters in a font name, and the
period(decimal-point) to the text font size.
Dash is likely not the only character that needs to be added to word
characters for the fontnames, but it is pretty common with fonts being
commonly named things like "Courier-Bold" or "Times-Italic" and other
similar names. The correct solution would allow any characters other
than white space or delimiters, That would be sufficient to make it a
PDF Name, and should deal with all valid PDFs. A further test could
check that it is a valid PostScript name, so it can't have the
appearance of a number following the slash. The code for that should be
if ($da =~
m{ \s*/
([$!#&'*+,.:;=?@\\^`|~\w-]+)
\s+
([.\d]+)
\s+Tf.*? \z
}xms)
but I have not tested that, apart from the syntax, and have not included
it in the patch.
3. The fillFormsField is searching for the font in DR entry in the Field
Dictionary in order to get the font metrics. According to PDF Reference
version 1.6 (implementation note 117)
"In PDF 1.2, an additional entry in the field dictionary, DR, was
defined but was never implemented. Beginning with PDF 1.5, this entry is
obsolete and should be ignored."
Instead, as defined in Section 8.6.2 under Variable Text, the
information should be taken from the interactive form dictionary's DR
entry.
So, I changed the code at 4421 to look up the interactive field
dictionary for the DR entry, rather than in the field dictionary.
4. Similarly, when fillFormsField at 4576 tries to set the font object
in the Resources directory, it should get the font object from the
interactive forms dictionary's DR entry
Perl Version:
This is perl, v5.10.0 built for x86_64-linux-gnu-thread-multi
Subject: | campdf.diff |
--- CAM-PDF-1.52-rev-bug/lib/CAM/PDF.pm 2008-10-03 03:35:52.000000000 -0400
+++ CAM-PDF-1.52/lib/CAM/PDF.pm 2010-06-04 15:52:29.000000000 -0400
@@ -2810,7 +2810,7 @@ sub getFormField
if ($fieldname =~ m/ [.] /xms)
{
my $parentname;
- if ($fieldname =~ s/ \A(.*)[.]([.]+)\z /$2/xms)
+ if ($fieldname =~ s/ \A(.*)[.]([^.]+)\z /$2/xms)
{
$parentname = $1;
}
@@ -4412,15 +4412,17 @@ sub fillFormFields ## no critic(Subrout
# Try to pull out the font size, if any. If more than
# one, pick the last one. Font commands look like:
# "/<fontname> <size> Tf"
- if ($da =~ m{ \s*/(\w+)\s+(\d+)\s+Tf.*? \z }xms)
+ if ($da =~ m{ \s*/([\w-]+)\s+([.\d]+)\s+Tf.*? \z }xms)
{
$fontname = $1;
$fontsize = $2;
if ($fontname)
{
- if ($propdict->{DR})
+ my $root = $self->getRootDict()->{AcroForm};
+ my $ifdict = $self->getValue($root);
+ if (exists $ifdict->{DR})
{
- my $dr = $self->getValue($propdict->{DR});
+ my $dr = $self->getValue($ifdict->{DR});
$fontmetrics = $self->getFontMetrics($dr, $fontname);
}
#print STDERR "Didn't get font\n" if (!$fontmetrics);
@@ -4573,17 +4575,26 @@ sub fillFormFields ## no critic(Subrout
}
my $fdict = $self->getValue($rdict->{Font});
- # Search out font resources. This is a total kluge.
- # TODO: the right way to do this is to look for the DR
- # attribute in the form element or it's ancestors.
+ # Search out font resources.
+ # As of PDF Reference fifth edition, the right way to do this
+ # is to get the DR attribute from the interactive form dictionary
+ # from the AcroForm entry in the document catalog
for my $font (@rsrcs)
{
- my $fobj = $self->dereference("/$font", 'All');
- if (!$fobj)
+ my $root = $self->getRootDict()->{AcroForm};
+ my $ifdict = $self->getValue($root);
+ if (!exists $ifdict->{DR})
+ {
+ die "Could not find resource /$font while preparing form field $key\n";
+ }
+ my $dr = $self->getValue($ifdict->{DR});
+ my $fobjnum = $dr->{Font}->{value}->{$font}->{value};
+
+ if (!$fobjnum)
{
die "Could not find resource /$font while preparing form field $key\n";
}
- $fdict->{$font} = CAM::PDF::Node->new('reference', $fobj->{objnum}, $formonum, $formgnum);
+ $fdict->{$font} = CAM::PDF::Node->new('reference', $fobjnum, $formonum, $formgnum);
}
}
}