Hi Eric,
There are several problems with this patch as written. I've included
criticisms below, which I intend constructively. Clearly you've put
some work into this, but I cannot accept it as is.
Furthermore, it would help if you explained your patch more
thoroughly. In particular, you say that the patch improved
traversal, but you don't say whether it improves performance or
correctness. Is there a problem with the existing traverse routine?
Chris
On Apr 30, 2008, at 4:36 PM, Eric Hall via RT wrote:
Show quoted text>
> Wed Apr 30 17:36:27 2008: Request 35555 was acted upon.
> Transaction: Ticket created by rt.cpan.org@darkart.com
> Queue: CAM-PDF
> Subject: patch to improve node traversal
> Broken in: (no value)
> Severity: (no value)
> Owner: Nobody
> Requestors: rt.cpan.org@darkart.com
> Status: new
> Ticket <URL:
http://rt.cpan.org/Ticket/Display.html?id=35555 >
>
>
> Hello-
> Attached is a patch against 1.13 which improved PDF node
> traversal for us.
>
>
> -eric
>
>
>
> --- lib/CAM/PDF.pm.orig 2007-11-28 21:42:46.000000000 -0800
> +++ lib/CAM/PDF.pm 2008-03-27 20:06:54.000000000 -0700
> @@ -566,6 +566,9 @@
> $CAM::PDF::errstr = "Could not decipher xref row:\n" .
> $self->trimstr($row);
> return;
> }
> + if ((0 == $1) && (0 == $2)) {
> + next;
> + }
> if ($type eq 'n')
> {
> $index->{$objnum} = $indexnum;
This addition is dead code. The case of a zero indexnum is already
handled in the conditional above this.
Show quoted text> @@ -3785,6 +3788,7 @@
> my $otherdoc = shift;
> my $otherkey = shift;
> my $follow = shift;
> + my %traversedNodes = ();
>
> # careful! 'undef' means something different from '0' here!
> if (!defined $follow)
> @@ -3838,10 +3842,10 @@
> my $newkey = $self->appendObject($otherdoc, $oldrefkey, 0);
> $newrefkeys{$oldrefkey} = $newkey;
> }
> - $self->changeRefKeys($objnode, \%newrefkeys);
> + $self->changeRefKeys($objnode, \%newrefkeys, \%traversedNodes);
> for my $newkey (values %newrefkeys)
> {
> - $self->changeRefKeys($self->dereference($newkey), \%
> newrefkeys);
> + $self->changeRefKeys($self->dereference($newkey), \%
> newrefkeys, \%traversedNodes);
> }
> }
> return (%newrefkeys);
Is the point of this to remember what you saw across calls to
changeRefKeys? If yes, than I can see where this would probably be a
performance boost.
Show quoted text> @@ -5040,7 +5044,7 @@
> if ($stream)
> {
> $stream = $self->{crypt}->encrypt($self, $stream, $objnode->
> {objnum}, $objnode->{gennum});
> - $str .= "\nstream\n" . $stream . 'endstream';
> + $str .= "\nstream\n" . $stream . '\nendstream';
> }
> return "obj\n$str\nendobj\n";
> }
This is wrong. '\n' is equivalent to 'n' and I really don't think
you want to append an 'n' to each stream!
Show quoted text> @@ -5072,6 +5076,7 @@
> my $objnode = shift;
> my $func = shift;
> my $funcdata = shift;
> + my $funcResult = undef;
>
> my $traversed = {};
> my @stack = ($objnode);
> @@ -5080,7 +5085,8 @@
> while ($i < @stack)
> {
> my $objnode = $stack[$i++];
> - $self->$func($objnode, $funcdata);
> + $funcResult = undef;
> + $funcResult = $self->$func($objnode, $funcdata);
>
> my $type = $objnode->{type};
> my $val = $objnode->{value};
The above change does nothing. You aren't even using the value you
store in $funcResult.
Show quoted text> @@ -5108,6 +5114,62 @@
> return;
> }
>
> +sub recurseTraverse {
> + my ($self, $deref, $objnode, $traversedRef, $func, $funcdata,
> $objnum) = @_;
> +
> + my $type;
> + my $val;
> + my @nodes = ();
> + my $node = undef;
> + my $newObjNum = undef;
> +
> + if ((!defined($objnode)) || (! ref $objnode )) {
> + return;
> + }
> +
> + if (defined($objnum)) {
> + $newObjNum = $objnum;
> + }
> +
> + $type = $objnode->{type};
> + $val = $objnode->{value};
> +
> + if (exists $objnode->{objnum}) {
> + $newObjNum = $objnode->{objnum};
> + }
> +
> + if ((defined($newObjNum)) &&
> + (exists ($traversedRef->{$newObjNum}))) {
> + return;
> + } else {
> + $self->$func($objnode, $funcdata);
> + }
> +
> + if ($type eq 'dictionary') {
> + push (@nodes, values %{$val});
> + } elsif ($type eq 'array') {
> + push (@nodes, @{$val});
> + } elsif ($type eq 'object') {
> + push (@nodes, $val);
> + } elsif ($type eq 'reference') {
> + if ($deref) {
> + push (@nodes, $self->dereference($val));
> + } else {
> + return;
> + }
> + }
> +
> + for $node (@nodes) {
> + recurseTraverse($self, $deref, $node, $traversedRef, $func,
> $funcdata, $newObjNum);
> + }
> +
> + if (($type eq 'object')) {
> + $traversedRef->{$newObjNum} = 1;
> + }
> +
> + return;
> +}
> +
> # decodeObject and decodeAll differ from each other like this:
> #
> # decodeObject JUST decodes a single stream directly below the
> object
You neglected to document recurseTraverse, so I'm having some trouble
understanding the point of this method. It looks like a slower
version of my traverse() method, which deliberately uses an explicit
stack instead of the program stack for a big performance boost. I
don't understand why you check for under or scalar $objnode. Those
cases should not even be possible, I believe.
Show quoted text> @@ -5538,10 +5600,11 @@
> my $self = shift;
> my $objnode = shift;
> my $newrefkeys = shift;
> + my $traversedRef = shift;
>
> my $follow = shift || 0; # almost always false
>
> - $self->traverse($follow, $objnode, \&_changeRefKeysCB,
> $newrefkeys);
> + $self->recurseTraverse($follow, $objnode, $traversedRef,
> \&_changeRefKeysCB, $newrefkeys, 0);
> return;
> }
>
Show quoted text>
> @@ -5558,9 +5621,10 @@
> if (exists $newrefkeys->{$objnode->{value}})
> {
> $objnode->{value} = $newrefkeys->{$objnode->{value}};
> + return 1;
> }
> }
> - return;
> + return 0;
> }
>
> =item $doc->abbrevInlineImage($object)
The return value of this function is never used. Why is it returning
1 or 0?