Bug #113084 for PDF-API2: Unreproducible internal font IDs

Wed Mar 16 09:27:23 2016 dam [...] cpan.org - Ticket created

From:	dam [...] cpan.org
Subject:	Unreproducible internal font IDs

We have the following bug reported to the Debian package of PDF-API2 (https://bugs.debian.org/818363): It doesn't seem to be a bug in the packaging, so you may want to take a look. Thanks! ------8<-----------8<-----------8<-----------8<-----------8<----- The TrueType, BdFont, CoreFont and Postscript subclasses of PDF::API2::Resource::Font append '~'.time() to the resource identifiers they generate. This is unnecessary because the identifier already includes the result of the pdfkey() function, which gives increasing numbers through a process lifetime. Adding time() is in itself not enough to guarantee unique IDs, but only introduces unreproducible output and poses a slight performance penalty for the syscall needed to get the current time. The patch below removes time() from the resource IDs, which is sufficient to get reproducible output. ------8<-----------8<-----------8<-----------8<-----------8<----- Thanks for considering, Damyan Ivanov, Debian Perl Group ------8<-----------8<-----------8<-----------8<-----------8<----- --- a/lib/PDF/API2/Resource/CIDFont/TrueType.pm +++ b/lib/PDF/API2/Resource/CIDFont/TrueType.pm @@ -38,7 +38,7 @@ sub new { my ($ff,$data)=PDF::API2::Resource::CIDFont::TrueType::FontFile->new($pdf,$file,@opts); $class = ref $class if ref $class; - my $self=$class->SUPER::new($pdf,$data->{apiname}.pdfkey().'~'.time()); + my $self=$class->SUPER::new($pdf,$data->{apiname}.pdfkey()); $pdf->new_obj($self) if(defined($pdf) && !$self->is_obj($pdf)); $self->{' data'}=$data; @@ -51,7 +51,7 @@ sub new { $de->{'FontDescriptor'} = $des; $de->{'Subtype'} = PDFName($self->iscff ? 'CIDFontType0' : 'CIDFontType2'); - ## $de->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname).'~'.time()); + ## $de->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname)); $de->{'BaseFont'} = PDFName($self->fontname); $de->{'DW'} = PDFNum($self->missingwidth); if($opts{-noembed} != 1) --- a/lib/PDF/API2/Resource/Font/BdFont.pm +++ b/lib/PDF/API2/Resource/Font/BdFont.pm @@ -56,7 +56,7 @@ sub new { my %opts=@opts; $class = ref $class if ref $class; - $self = $class->SUPER::new($pdf, sprintf('%s+Bdf%02i',pdfkey(),++$BmpNum).'~'.time()); + $self = $class->SUPER::new($pdf, sprintf('%s+Bdf%02i',pdfkey(),++$BmpNum)); $pdf->new_obj($self) unless($self->is_obj($pdf)); # adobe bitmap distribution font @@ -202,7 +202,7 @@ sub readBDF { $data->{bbox}{'.notdef'} = [0, 0, 0, 0]; } - $data->{fontname}=pdfkey().pdfkey().'~'.time(); + $data->{fontname}=pdfkey(); $data->{apiname}=$data->{fontname}; $data->{flags} = 34; $data->{fontbbox} = [ split(/\s+/,$data->{FONTBOUNDINGBOX}) ]; --- a/lib/PDF/API2/Resource/Font/CoreFont.pm +++ b/lib/PDF/API2/Resource/Font/CoreFont.pm @@ -164,7 +164,7 @@ sub new #} $class = ref $class if ref $class; - $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey().'~'.time()); + $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey()); $pdf->new_obj($self) unless($self->is_obj($pdf)); $self->{' data'}=$data; $self->{-dokern}=1 if($opts{-dokern}); --- a/lib/PDF/API2/Resource/Font/Postscript.pm +++ b/lib/PDF/API2/Resource/Font/Postscript.pm @@ -28,7 +28,7 @@ sub new { } $class = ref $class if ref $class; - $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey().'~'.time()); + $self = $class->SUPER::new($pdf, $data->{apiname}.pdfkey()); $pdf->new_obj($self) unless($self->is_obj($pdf)); $self->{' data'}=$data; @@ -40,7 +40,7 @@ sub new { $self->{'FontDescriptor'}=$self->descrByData(); if(-f $psfile) { - $self->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname).'~'.time()); + $self->{'BaseFont'} = PDFName(pdfkey().'+'.($self->fontname)); my ($l1,$l2,$l3,$stream)=$self->readPFAPFB($psfile); ------8<-----------8<-----------8<-----------8<-----------8<-----

Wed Mar 16 11:04:07 2016 philperry [...] hvc.rr.com - Correspondence added

Subject:	[rt.cpan.org #113084]
Date:	Wed, 16 Mar 2016 11:04:14 -0400
To:	bug-PDF-API2 [...] rt.cpan.org
From:	Phil M Perry <philperry [...] hvc.rr.com>

See also #105579. It sounds like it might be a very similar issue.

Wed Mar 16 11:04:07 2016 The RT System itself - Status changed from 'new' to 'open'

Wed Mar 16 11:13:36 2016 dam [...] cpan.org - Correspondence added

Subject:	Re: [rt.cpan.org #113084]
Date:	Wed, 16 Mar 2016 15:13:19 +0000
To:	"philperry [...] hvc.rr.com via RT" <bug-PDF-API2 [...] rt.cpan.org>
From:	Damyan Ivanov <dam [...] cpan.org>

-=| philperry@hvc.rr.com via RT, 16.03.2016 11:04:12 -0400 |=- Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=113084 > > > See also #105579. It sounds like it might be a very similar issue.

Yes, this seems to be the same issue. Interestingly, hash order seems not to be a problem -- after removing the time stamps, the PDF output gets completely deterministic in my tests.

Thu May 12 16:23:07 2016 steve [...] deefs.net - Correspondence added

Thanks for the suggestion. The version control history doesn't go all the way back, so I can't tell why the time() call was added to the IDs initially, but since it exists in more-commonly-used parts of the code and is missing in less-commonly-used parts of the code, I'm going to guess that it was added at some point to work around a name collision issue, possibly related to merging multiple PDFs created by PDF::API2. Using time() isn't as fancy as, say, using UUIDs, but it's a lot faster and likely good enough for most real-world applications. There are definitely benefits to having the same code produce the same PDF repeatedly, but there are also potential drawbacks, so I'm going to leave this part of the code as is.

Thu May 12 16:23:13 2016 steve [...] deefs.net - Status changed from 'open' to 'rejected'