Subject: | Given same input, different (byte- and sizewise) PDF files are created |
I am use PDF::API2 to create very simple, text-only, PDF files. I noticed that when I use the same text as input, PDF::API2 produces different output -- the files have different sizes. I am attaching a program to demonstrate. Run it three times: changes are, you will end up with three PDF files of different sizes.
I used ImageMagick's convert utility to convert these PDFs to GIFs: the GIFs are identical. This is good, but I think it the source of randomness should be removed, just for one's sanity's sake.
Subject: | random-size-pl.txt |
use strict;
use warnings;
use Getopt::Long;
use PDF::API2;
GetOptions(
"n-pages=i" => \(my $n_pages = 100),
);
my $text = <<'TEXT';
Messages consist of lines of text. No special provisions
are made for encoding drawings, facsimile, speech, or structured
text. No significant consideration has been given to questions
of data compression or to transmission and storage efficiency,
and the standard tends to be free with the number of bits con-
sumed. For example, field names are specified as free text,
rather than special terse codes.
A general "memo" framework is used. That is, a message con-
sists of some information in a rigid format, followed by the main
part of the message, with a format that is not specified in this
document. The syntax of several fields of the rigidly-formated
("headers") section is defined in this specification; some of
these fields must be included in all messages.
The syntax that distinguishes between header fields is
specified separately from the internal syntax for particular
fields. This separation is intended to allow simple parsers to
operate on the general structure of messages, without concern for
the detailed structure of individual header fields. Appendix B
is provided to facilitate construction of these parsers.
In addition to the fields specified in this document, it is
expected that other fields will gain common use. As necessary,
the specifications for these "extension-fields" will be published
through the same mechanism used to publish this document. Users
may also wish to extend the set of fields that they use
privately. Such "user-defined fields" are permitted.
The framework severely constrains document tone and appear-
ance and is primarily useful for most intra-organization communi-
cations and well-structured inter-organization communication.
It also can be used for some types of inter-process communica-
tion, such as simple file transfer and remote job entry. A more
robust framework might allow for multi-font, multi-color, multi-
dimension encoding of information. A less robust one, as is
present in most single-machine message systems, would more
severely constrain the ability to add fields and the decision to
include specific fields. In contrast with paper-based communica-
tion, it is interesting to note that the RECEIVER of a message
can exercise an extraordinary amount of control over the
message's appearance. The amount of actual control available to
message receivers is contingent upon the capabilities of their
individual message systems.
TEXT
my $pdf = PDF::API2->new;
my $font = $pdf->corefont('Courier');
for (my $n = 0; $n < $n_pages; ++$n) {
my @lines = split /\n/, $text;
# Change the text up a little bit (move a line to the first
# position), so that I can tell that there is more than one
# page in a GIF when I convert it. (I use ImageMagick's
# convert utility to convert PDFs to GIF to do pixel-by-pixel
# comparison).
my $pick_a_line = splice @lines, $n % @lines, 1;
my $page_text = join "\n", $pick_a_line, @lines;
my $page = $pdf->page;
$page->mediabox(612, 792);
my $content = $page->text;
$content->translate(0, 780);
$content->font($font, 12);
$content->lead(12);
$content->section($page_text, 612, 780);
}
$pdf->saveas($ARGV[0]);