Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Storable CPAN distribution.

Report information
The Basics
Id: 4901
Status: new
Worked: 1.9 hours (115 min)
Priority: 0/
Queue: Storable

People
Owner: Nobody in particular
Requestors: natg [...] shore.net
Cc: adamk [...] cpan.org
RSOD [...] cpan.org
simon [...] simon-cozens.org
AdminCc:

Bug Information
Severity: Wishlist
Broken in: 2.09
Fixed in: (no value)



Subject: Letting STORABLE_thaw return object
It would be nice if there were a way for STORABLE_thaw to return a reference of its own choosing, rather than requiring that it fill in the empty object passed to it. I'm developing a persistent object store, Class::AutoDB, that features the ability to incrementally reload objects from the database and reconstruct shared object structures as objects are fetched. Suppose objects A and B both refer to object C. A and B can be fetched independently, and this does not immediately cause C to be fetched. But when C is fetched, the references from A and B must be “swizzled” to point to the same in-memory version of C. To make this work, we want the thaw hook to consult a hash to see if the object being thawed is already in memory, and if so, return a reference to the existing in-memory copy. Data:Dumper’s thaw hook lets us do this and that’s what we’re using now. Storable has many advantages over Data::Dumper and we would prefer to use it instead (or provide both and let users choose), if only the thaw hook gave us this one additional feature. Thanks very much, Nat Goodman
I concur. The whole "we'll give you an object" idea is flawed. For such a low-level coding interface, it's a bad example of trying to help out. The ideal interface would be not to create an empty and instantiated (and in most cases illegal) object for you, but let you do it yourself and return it. The current implementation does not support Singleton classes... it creates an illegal second instance of the Singleton. It doesn't support pooled or keyed objects that a process-wide and should NOT go into the Storable. Since you obviously don't want to change the existing interface, allowing the user to return an alternative object would be ideal, or perhaps to create a second alternate STORABLE_thaw_class for the task (I prefer the former). In most cases where existing implmentations are accidentally returning objects, it should almost certainly be the SAME object as what they were passed.
From: cpan [...] ali.as
OK I've done a full sweep through every single module in CPAN that uses STORABLE_thaw. About a third of them ASSUME that they have to pass $self as the return value, another third just C< return; > and the rest look ok, except for very rare race conditions. I've filed RT bugs against all of THOSE to ensure they move to C< return; > or C< return $self; > to prevent any races.
From: natg [...] shore.net
I appreciate your looking into this. Your response suggests that you're planning to make the change. Yes? If so, do you have an expected completion date? Many thanks, Nat Goodman
From: cpan [...] ali.as
I have just filed a bug 11813 for the specific Singleton case that can't be fixed except with this change. Listed in that bug is a full test case for the problem, which I'll also attach here.
#!/usr/bin/perl -w # Tests the freezing/thawing structures containing Singleton objects, # which should see both structs pointing to the same object. # In order to make this test work, ANY method needs to exist that can # be used in STORABLE_freeze and STORABLE_thaw in order to make the tests # pass. use Storable (); use Test::More tests => 7; # Get the singleton my $object = My::Singleton->new; isa_ok( $object, 'My::Singleton' ); # Confirm (for the record) that the class is actually a Singleton my $object2 = My::Singleton->new; isa_ok( $object2, 'My::Singleton' ); is( "$object", "$object2", 'Class is a singleton' ); ############ # Main Tests my $struct = [ 1, $object, 3 ]; # Freeze the struct my $frozen = Storable::freeze( $struct ); ok( (defined($frozen) and ! ref($frozen) and length($frozen)), 'freeze returns a string' ); # Thaw the struct my $thawed = Storable::thaw( $frozen ); # Now it should look exactly like the original is_deeply( $struct, $thawed, 'Struct superficially looks like the original' ); # ... EXCEPT that the Singleton should be the same instance of the object is( "$struct->[1]", "$thawed->[1]", 'Singleton thaws correctly' ); # We can also test this empirically $struct->[1]->{value} = 'Goodbye cruel world!'; is_deeply( $struct, $thawed, 'Empiric testing corfirms correct behaviour' ); # End Tests ########### package My::Singleton; my $SINGLETON = undef; sub new { $SINGLETON or $SINGLETON = bless { value => 'Hello World!' }, $_[0]; } sub STORABLE_freeze { my $self = shift; # We don't actually need to return anything, but provide a null string # to avoid the null-list-return behaviour. return (''); } sub STORABLE_thaw { my ($obj, $clone, $string) = @_; # Get the Singleton object my $self = My::Singleton->new; ### IS THERE ANY POSSIBLE CODE FOR STORABLE_thaw THAT CAN ### BE USED TO PASS THE TESTS??? ### LET'S TRY TO DOCUMENTED WAY OF CREATING THE OBJECT %$obj = %$self; return; } # This one would work under the proposed changes to Storable sub STORABLE_thaw_new { My::Singleton->new; }
From: cpan [...] ali.as
I'm no C coder myself, but I've written up the following project specification, and I'll be paying Coraline (RSOD) to implement it for me. I'd imagine he'll find the time to do it and post the patch some time in the next few days, assuming you are happy to review/merge/release. --------------------------------- Overview: Storable's STORABLE_thaw method currently "helpfully" provides a blessed method which you are expected to fill when thawing objects with custom Storable hooks. More details in the Storable documentation. This is a major problem with Singleton classes and other things that it is NEVER safe for anyone outside of the Singleton class itself to arbitrarily bless object. The full list of the types of classes are Singleton: my $one = My::Singleton->new; my $two = My::Singleton->new; ok(refaddr($one) == refaddr($two), 'Singleton class working'); Keyed: my $one = My::Keyed->new('foo'); my $two = My::Keyed->new('foo'); my $three = My::Keyed->new('bar'); ok(refaddr($one) == refaddr($two), 'Keyed class works for same key'); ok(refaddr($one) != refaddr($three), 'Keyed class works for different'); Pooled: (As above but with the actual objects such as database connections returned determined by some more complex means such as resource load levels or so on). The expected behaviour would thus be that when the object is frozen it returns no information (Singleton) or the key name (Keyed) and when thawed the correct object is obtained from the class and used in the thawed struct. ---------------------- Work Description: Alter the behaviour of STORABLE_thaw in a backwards compatible way so that some implementation of STORABLE_freeze/thaw is possible to properly implement Storable hooks support for the above cases. The recommended way is described in part at the following URL. http://perlmonks.org/?node_id=437784 It is also mentioned as point 2) at the following Storable bug http://rt.cpan.org/NoAuth/Bug.html?id=6641 And also at the following http://perlmonks.org/NoAuth/Bug.html?id=4901 A complete unit tests for the correct behaviour in Singleton classes (as an example) can be found at http://rt.cpan.org/NoAuth/Bug.html?id=11813 The items to be implemented are 1. Alter Storable to implement in C/XS at the appropriate location the as described in the Equivalent Perl Implementation following. The porting need not be identical, but should implement the same intent. 2. Add singleton.t to the Storable tests. 3. Write and add keyed.t (testing the "Keyed" case described at top) to the Storable tests. (The Pooled case is not required) 4. Ensure all unit tests pass 5. Create a patch against the most current released Storable dist and submit/attach it to rt.cpan.org bug 4901 with any comments for the maintainer. 6. If needed, make any minor structural or style changes as requested by the maintainer and resubmit the patch to the same location. -------------------------- Equivalent Perl Implementation If the current implementation of Storable's use of STORABLE_thaw could be described as (describing only the HASH case) ($class, $string, @references) = get_basic_hooked_thaw; $object = bless { }, $class; if ( $object->can('STORABLE_thaw') ) { $object->STORABLE_thaw($string, @references); } Alter it to the following use Scalar::Util 'blessed'; ($class, $string, @references) = get_basic_hooked_thaw; $object = bless { }, $class; if ( $object->can('STORABLE_thaw') ) { # Call in scalar context and get the return value my $rv = $object->STORABLE_thaw($string, @references); # Did STORABLE_thaw return an alternate object of the same class if ( ref $rv and blessed($rv) and $rv->isa($class) ) { $object = $rv; } } Please note that Scalar::Util is not a dependency for Storable, and thus the C/XS equivalent of "Is the return value an alternate object of the same class" should be used to the satisfaction of yourself and the Storable maintainers. ---------------
From: cpan [...] ali.as
I've got an objection on PerlMonks (just the one) to changing STORABLE_thaw, so I should probably outline the alternatives I had considered. 1. New module or subclass Probably not a good idea, because we can't control which version is called at the top level. If my Singleton works with one and not with the other, the problems still occur, because I can't control which one some user will use. They would also likely be incompatible. Better to wait and possible do Storable2 or Perl6 Storable with a new API? 2. New mechanism Do a different pair of DIFFERENTSTORABLE_freeze/thaw methods... ugh 3. New and alternate thaw method Add a new strictly correct alternate one sub STORABLE_static_thaw { my ($class, $string, @references) = @_; # Instantiate and return my $self = bless {}, $class; .... return $self; } And then in Storable my ($class, $string, @refs) = raw_thaw(); my $object; if ( $class->can('STORABLE_static_thaw') ) { $object = $class->STORABLE_static_thaw( $string, @refs ); die unless $object->isa($class); } else { # Same as we do now $object = bless {}, $class; if ( $class->can('STORABLE_thaw') ) { $object->STORABLE_thaw( $string, @refs ); } } If we want to be REALLY careful, and given that much of the core'ish modules like Storable are going to be rewritten for Perl6 anyway, maybe this last one is a better choice.
From: samv [...] cpan.org
In order to successfully complete this change, you will need to understand why the API was designed like that in the first place. STORABLE_freeze lets you return a list of objects that will be frozen in place of the `real' object. On the way back, it needs to construct the object again. However, what if the object returns a list of items that end up referring back to another point in the data structure? For instance, my $object = bless { foo => "bar" }, "Thing"; my $object2 = bless { bar => "baz" }, "Thing"; $object->{superset} = Set::Object->new($object, $object2); So, in this case the structure will be stored as the logical; $o = { foo => "bar", superset => \[ $o, { bar => "baz" } ] } You can see already that there is a precedence problem. The picture can get much, much worse than this - the end result being that not all objects being used to do the reconstruction may be built yet, but Storable has to construct an object in terms of it. Hence you have a "chicken and egg problem" that means Storable has to construct these placeholders in order to break the loop. It is quite easy to work around this problem; before the freeze you traverse the data structure and replace occurances of the singleton/storage object with placeholders, which are replaced afterwards. See Tangram::Dump in recent Tangram distributions for an example of this. Part of the reason Storable has such great performance is because it performs its task with a single pass of the data structure. If this cannot be maintained, then the distribution needs to be forked rather than "fixed". In short, I think you might be hitting a rather obscure "caveat", not a "flaw", and it might not be possible to make this change without drastically breaking performance. But please, feel free to prove me wrong.
Show quoted text
> It is quite easy to work around this problem; before the freeze you > traverse the data structure and replace occurances of the > singleton/storage object with placeholders, which are replaced > afterwards. See Tangram::Dump in recent Tangram distributions for an > example of this. > > Part of the reason Storable has such great performance is because it > performs its task with a single pass of the data structure. If this > cannot be maintained, then the distribution needs to be forked rather > than "fixed".
Good points. I have a similar situation in import/export in a relation system, and I see how this might have similar issues. Without changing the way in which the freeze half is done, one option for the thaw half which would not require the use of placeholders might be to make it two pass as follows. On the first pass we leave the empty/illegal/default (EID from here on) object in place to keep the references sane, but store the returned object away, keyed against the memory address of the EID object. After the normal first pass we do a second pass (conditional on there actually being anything stored away) which walks the resulting structure (using normal circular and recursion prevention) and replaces EID references with the ones in the store.
Testing for RSOD
Due to permissions bizarreness, for some reason I can't see my own comments. I'm told "Reply" might fix this. This is the circular hooks demo
#!/usr/bin/perl use Storable (); use Test::More tests => 8; my $ddd = bless { }, 'Foo'; my $eee = bless { Bar => $ddd }, 'Bar'; $ddd->{Foo} = $eee; my $array = [ $ddd ]; my $string = Storable::freeze( $array ); my $thawed = Storable::thaw( $string ); # is_deeply infinite loops in ciculars, so do it manually # is_deeply( $array, $thawed, 'Circular hooked objects work' ); is( ref($thawed), 'ARRAY', 'Top level ARRAY' ); is( scalar(@$thawed), 1, 'ARRAY contains one element' ); isa_ok( $thawed->[0], 'Foo' ); is( scalar(keys %{$thawed->[0]}), 1, 'Foo contains one element' ); isa_ok( $thawed->[0]->{Foo}, 'Bar' ); is( scalar(keys %{$thawed->[0]->{Foo}}), 1, 'Bar contains one element' ); isa_ok( $thawed->[0]->{Foo}->{Bar}, 'Foo' ); is( $thawed->[0], $thawed->[0]->{Foo}->{Bar}, 'Circular is... well... circular' ); package Foo; sub STORABLE_freeze { my ($self, $clone) = @_; my $class = ref $self; print "# Freezing $class\n"; return ($class, $self->{$class}); } sub STORABLE_thaw { my ($self, $clone, $string, @refs) = @_; my $class = ref $self; print "# Thawing $class\n"; $self->{$class} = shift @refs; return; } package Bar; BEGIN { @ISA = 'Foo'; } 1;
Reposting again due to permissions weirdness, so people other than the queue admin can see it. ------------------------------------ OK, having thoroughly whiteboarded some more obscure cases and had a read through the thawing code, I think I can resolve all the problems to get to a final implementation. ---------------------------------- 1. Addition of a new method STORABLE_attach, which is to be used instead of STORABLE_thaw for classes where the object is a process or system level resource and the thawed structure needs to attach to an existing shared object rather than create a new object. 2. This alternative freeze/attach will have NO ability to use child references, to prevent circular loops below a detached object and thus the main cause of the problem of "create the reference early, but call the thaw hooks depth-first". This approach is reasonably as we really should not be storing anything inside the singleton'ish object anyway, it is owned and controled by the class itself, and anything inside it isn't really "our" data anyway. 3. Continue to use STORABLE_freeze as normal. However, on return from STORABLE_freeze check to see if the class has a STORABLE_attach method. If the class has a STORABLE_attach method AND the STORABLE_freeze call returned more than just a string (child references), throw an exception. 4. At thaw-time, check for a STORABLE_attach and if it has one, get the empty reference as normal, but rather than indexing and adding it to the context, use it to find it's class. Then get the string part as normal, and call $object = STORABLE_attach( $class, $string ); Store this object instead and add it to the context against the appropriate Object-ID so that additional references to the object will be linked to the same object as per normal. After storing, double check that the frozen object did not have any references. If so, throw an exception. ----------------------------- Short of someone changing the behaviour of a class to INTENTIONALLY cause problems, I don't see any remaining issues with this version. plain text (suitable for cut and paste, overrides all)
Attaching the basic singleton.t test, rewritten against the final STORABLE_attach concept described above.
#!/usr/bin/perl -w # Tests the freezing/thawing structures containing Singleton objects, # which should see both structs pointing to the same object. # In order to make this test work, ANY method needs to exist that can # be used in STORABLE_freeze and STORABLE_thaw in order to make the tests # pass. use Storable (); use Test::More tests => 7; # Get the singleton my $object = My::Singleton->new; isa_ok( $object, 'My::Singleton' ); # Confirm (for the record) that the class is actually a Singleton my $object2 = My::Singleton->new; isa_ok( $object2, 'My::Singleton' ); is( "$object", "$object2", 'Class is a singleton' ); ############ # Main Tests my $struct = [ 1, $object, 3 ]; # Freeze the struct my $frozen = Storable::freeze( $struct ); ok( (defined($frozen) and ! ref($frozen) and length($frozen)), 'freeze returns a string' ); # Thaw the struct my $thawed = Storable::thaw( $frozen ); # Now it should look exactly like the original is_deeply( $struct, $thawed, 'Struct superficially looks like the original' ); # ... EXCEPT that the Singleton should be the same instance of the object is( "$struct->[1]", "$thawed->[1]", 'Singleton thaws correctly' ); # We can also test this empirically $struct->[1]->{value} = 'Goodbye cruel world!'; is_deeply( $struct, $thawed, 'Empiric testing corfirms correct behaviour' ); # End Tests ########### package My::Singleton; my $SINGLETON = undef; sub new { $SINGLETON or $SINGLETON = bless { value => 'Hello World!' }, $_[0]; } sub STORABLE_freeze { my $self = shift; # We don't actually need to return anything, but provide a null string # to avoid the null-list-return behaviour. return (''); } sub STORABLE_attach { my ($class, $clone, $string) = @_; # Get the Singleton object and return it return $class->new; }
Attaching hook_attach_errors.t test script, which tests for the three main exceptions that it is possible for the new functionality to correctly cause.
#!/usr/bin/perl -w use Storable (); use Test::More tests => 35; ##################################################################### # Error 1 # # Classes that implement STORABLE_thaw _cannot_ have references # returned by their STORABLE_freeze method. When they do, Storable # should throw an exception # Good Case - should not die { my $goodfreeze = bless {}, 'My::GoodFreeze'; my $frozen = undef; eval { $frozen = Storable::freeze( $goodfreeze ); }; ok( ! $@, 'Storable does not die when STORABLE_freeze does not return references' ); ok( $frozen, 'Storable freezes to a string successfully' ); package My::GoodFreeze; sub STORABLE_freeze { my ($self, $clone) = @_; # Illegally include a reference in this return return (''); } sub STORABLE_attach { my ($class, $clone, $string) = @_; return bless { }, 'My::GoodFreeze'; } } # Error Case - should die on freeze { my $badfreeze = bless {}, 'My::BadFreeze'; eval { Storable::freeze( $badfreeze ); }; ok( $@, 'Storable dies correctly when STORABLE_freeze returns a referece' ); # Check for a unique substring of the error message ok( $@ =~ /cannot return references/, 'Storable dies with the expected error' ); package My::BadFreeze; sub STORABLE_freeze { my ($self, $clone) = @_; # Illegally include a reference in this return return ('', []); } sub STORABLE_attach { my ($class, $clone, $string) = @_; return bless { }, 'My::BadFreeze'; } } ##################################################################### # Error 2 # # If, for some reason, a STORABLE_attach object is accidentally stored # with references, this should be checked and and error should be throw. # Good Case - should not die { my $goodthaw = bless {}, 'My::GoodThaw'; my $frozen = undef; eval { $frozen = Storable::freeze( $goodthaw ); }; ok( $frozen, 'Storable freezes to a string as expected' ); my $thawed = eval { Storable::thaw( $frozen ); }; isa_ok( $thawed, 'My::GoodThaw' ); is( $thawed->{foo}, 'bar', 'My::GoodThaw thawed correctly as expected' ); package My::GoodThaw; sub STORABLE_freeze { my ($self, $clone) = @_; return (''); } sub STORABLE_attach { my ($class, $clone, $string) = @_; return bless { 'foo' => 'bar' }, 'My::GoodThaw'; } } # Bad Case - should die on thaw { # Create the frozen string normally my $badthaw = bless { }, 'My::BadThaw'; my $frozen = undef; eval { $frozen = Storable::freeze( $badthaw ); }; ok( $frozen, 'BadThaw was frozen with references correctly' ); # Set up the error condition by deleting the normal STORABLE_thaw, # and creating a STORABLE_attach. *My::BadThaw::STORABLE_attach = *My::BadThaw::STORABLE_thaw; *My::BadThaw::STORABLE_attach = *My::BadThaw::STORABLE_thaw; # Suppress a warning delete ${'My::BadThaw::'}{STORABLE_thaw}; # Trigger the error condition my $thawed = undef; eval { $thawed = Storable::thaw( $frozen ); }; ok( $@, 'My::BadThaw object dies when thawing as expected' ); # Check for a snippet from the error message ok( $@ =~ /unexpected references/, 'Dies with the expected error message' ); package My::BadThaw; sub STORABLE_freeze { my ($self, $clone) = @_; return ('', []); } # Start with no STORABLE_attach method so we can get a # frozen object-containing-a-reference into the freeze string. sub STORABLE_thaw { my ($class, $clone, $string) = @_; return bless { 'foo' => 'bar' }, 'My::BadThaw'; } } ##################################################################### # Error 3 # # Die if what is returned by STORABLE_attach is not something of that class # Good Case - should not die { my $goodattach = bless { }, 'My::GoodAttach'; my $frozen = Storable::freeze( $goodattach ); ok( $frozen, 'My::GoodAttach return as expected' ); my $thawed = eval { Storable::thaw( $frozen ); }; isa_ok( $thawed, 'My::GoodAttach' ); is( ref($thawed), 'My::GoodAttach::Subclass', 'The slightly-tricky good "returns a subclass" case returns as expected' ); package My::GoodAttach; sub STORABLE_freeze { my ($self, $cloning) = @_; return (''); } sub STORABLE_attach { my ($class, $cloning, $string) = @_; return bless { }, 'My::GoodAttach::Subclass'; } package My::GoodAttach::Subclass; BEGIN { @ISA = 'My::GoodAttach'; } } # Bad Cases - die on thaw { my $returnvalue = undef; # Create and freeze the object my $badattach = bless { }, 'My::BadAttach'; my $frozen = Storable::freeze( $badattach ); ok( $frozen, 'BadAttach freezes as expected' ); # Try a number of different return values, all of which # should cause Storable to die. my @badthings = ( undef, '', 1, [], {}, \"foo", (bless { }, 'Foo'), ); foreach ( @badthings ) { $returnvalue = $_; my $thawed = undef; eval { $thawed = Storable::thaw( $frozen ); }; ok( $@, 'BadAttach dies on thaw' ); ok( $@ =~ /STORABLE_attach did not return a My::BadAttach object/, 'BadAttach dies on thaw with the expected error message' ); is( $thawed, undef, 'Double checking $thawed was not set' ); } package My::BadAttach; sub STORABLE_freeze { my ($self, $cloning) = @_; return (''); } sub STORABLE_attach { my ($class, $cloning, $string) = @_; return $returnvalue; } }
From: natg [...] shore.net
Fascinating discussion but I think it’s gotten a little off track. [By way of introduction, I’m the ‘guest’ who posed the initial bug report. We have a persistent object store (Class::AutoDB) that uses Data::Dumper. We want to make this work on all reasonable serialization engines, including Storable and YAML.] A thaw hook, in any scheme, needs two pieces of information: the serialized form to be thawed, and a unique identifier for the reference being thawed (to break cycles among other things). In Storable, the unique identifier is the refaddr of the empty object created by the engine. This is an elegant approach that works in many cases. The problem comes from ASSUMING that the empty object will always become the real object. The solution is to let the thaw hook decide whether it can use the empty object or whether it needs to use something else. The engine is not seriously affected by this decision for the simple reason that there is no logical connection between the objects passed from engine to hook and the objects put by the hook into the output data structure. Turning to SAMV’s example: ---------- … the structure will be stored as the logical; $o = { foo => "bar", superset => \[ $o, { bar => "baz" } ] } Show quoted text
______ Assuming $o has a thaw hook, it will be called first for the outer reference. The engine will pass in an empty object with refaddr, say 12345. The hook should record this information (in a hash, typically) to say “I’ve been seen”. If the hook wants to use an object of its own choosing, it associates this object with the refaddr. Then it calls the Storable engine to continue thawing. The hook will be called again on the inner reference. Again, the engine will pass in 12345. This time the hook will know it’s been seen before and (1) break the cycle by not calling the engine, and (2) return the object previously associated with refaddr 12345. The only change that’s needed in the Storable code is to pay attention to the object returned by the thaw hook and put it into the thawed data structure. Nothing else has to change! What have I missed? Best, Nat Goodman
Show quoted text
> Turning to SAMV’s example: > ---------- > … the structure will be stored as the logical; > > $o = { foo => "bar", > superset => \[ $o, { bar => "baz" } ] > } > ______ > Assuming $o has a thaw hook, it will be called first for the outer > reference. > > What have I missed?
The hook is NOT called for the outer reference first. That is what you have missed. Grab the circular_hooks.t test script I attached earlier, which demonstrates this. Run through it in the debugger if you like. The key problem is that in order to call the hook, we need to have any references it might have stored in STORABLE_freeze. THOSE references may have other references in them, that eventually circular back to the same object. So in order to do it in one pass, the following steps have to occur. 1. Create stub (the object currently passed to STORABLE_thaw) for the outer object. 2. Start to parse the refs inside the outer (descending) 3. Create placeholder for inner object. 4. Start to parse the refs inside the inner. 5. Encounter the ref back to the outer, create the ref to the stub for the outer. 6. Call STORABLE_thaw for the inner, passing it the ref to the stub for the outer. 7. Call STORABLE_thaw for the outer, passing it the ref to the inner. ------------------------------------------ So in other words, STORABLE_thaw has to be called depth-first, but in order to handle circulars, but the stub objects need to be created before descending. There are only two ways of avoiding this problem. 1. Store the replacement object in a lookup hash as we go, and then do a second pass over the thawed struct and replace all refs to the stub with refs to the replacement. 2. Guarentee that you will never have circular references below the object. Doing 1. would be touchy to say the least, and might significantly impact on the performance of Storable itself, especially if lots of classes started to use it. My spidey senses are also tingling that there are some additional corner cases that would be very problematic. Doing 2. is nicely self-contained and solves as big a proportion of the problem space as possible, without impacting on the existing users. So I'm going with 2.
From: natg [...] shore.net
Show quoted text
> > What have I missed?
> > The hook is NOT called for the outer reference first. That is > what you have missed.
Sorry. I should have checked your test more carefully. Show quoted text
> 2. Guarentee that you will never have circular references > below the object. > ... > So I'm going with 2.
NOOOOO!!!!! This is wrong! Who can guarantee that their object structures are cycle-free? It's better to leave it the way it is!! Please give me a few days to ponder the problem. (I have a big deadline at work on Monday and may not be able to give this enough attention over the weekend.) Best, Nat
From: natg [...] shore.net
I scraped together some time to look at this further. There’s actually a deeper problem with the current design: if the structure is circular, the ‘side references’ – by which I mean the extra references returned by STORABLE_freeze – are not guaranteed to be thawed by the time the thaw hook is called. The problem is obvious: if the freeze hook for object $a returns $a itself as a side reference, it is impossible for the side reference to $a to be thawed before the thaw hook for $a is called, because the thaw hook is the guy who thaws $a! The engine currently hands back a reference to $a, but the referent object is not valid. In an object oriented setting, this is a recipe for some very subtle bugs! I’ve attached a small test program that demonstrates this obvious point. The conclusion is that freeze/thaw hooks and side references are incompatible features. In your sweep of CPAN modules that use STORABLE_thaw, did you find any that use side references? Since no one has reported this bug yet, it probably means that no one is using the combination, or, at least, no one is doing anything fancy with it. If side references aren’t being used, we’re back to the simpler case in which the thaw hook can be called top-down, and it will work to let the thaw hook return an object of its own choosing. Best, Nat
use strict; use Storable (); use Test::More tests => 4; my $a=new Foo('a'); my $string = Storable::freeze( $a ); my $thawed = Storable::thaw( $string ); is(ref $thawed,ref $a,"Thawed class correct"); is($thawed->{name},$a->{name},"Thawed name correct"); exit; package Foo; use Test::More; sub new { my $class=shift; $class=(ref $class)||$class; my $self=bless {name=>$_[0]},$class; } sub STORABLE_freeze { my ($self, $clone) = @_; my $class = ref $self; print "# Freezing $class\n"; return ($self->{name},$self); } sub STORABLE_thaw { my ($self, $clone, $string, @refs) = @_; my $class = ref $self; print "# Thawing $class\n"; my $thawed_ref=$refs[0]; my($thawed_name)=$string; is($thawed_ref,$self,"Thawed ref points to thawed object"); is($thawed_ref->{name},$thawed_name,"Thawed ref contains correct name"); $self->{name}=$thawed_name; return; } 1;
[guest - Sat Mar 12 14:37:07 2005]: Show quoted text
> I scraped together some time to look at this further. > > There’s actually a deeper problem with the current design: if the > structure is circular, the ‘side references’ – by which I mean the > extra references returned by STORABLE_freeze – are not guaranteed to > be thawed by the time the thaw hook is called.
Bingo. In a circular topology, you MUST initialise one before the other, and the current side reference solutions allows Storable to remain single-pass and sane. Show quoted text
> The conclusion is that freeze/thaw hooks and side references are > incompatible features. In your sweep of CPAN modules that use > STORABLE_thaw, did you find any that use side references? Since no > one has reported this bug yet, it probably means that no one is using > the combination, or, at least, no one is doing anything fancy with it. > > If side references aren’t being used, we’re back to the simpler case > in which the thaw hook can be called top-down, and it will work to let > the thaw hook return an object of its own choosing.
It isn't so much the DIRECT use of circular references, as it is the unknown nature of these things. If I have a basic collection/ARRAY type object, all it cares about, and all it NEEDS to care about, is that it can store the things inside it. If those things contain references to other things and so on until 15 references down the chain you finally get back to something near the top, then you are going to have a circular situation. BUT it is likely to be one that is both legal, and can be handled by hooks cleanly. Admittedly doing REALLY close in things where the thaw hooks depend deeply on the child references (not just that they exist at all) is incompatible. But I would suggest that nobody is doing that, otherwise they would have a problem. Most of the CPAN uses fall into that category. They don't appear to use them directly, but they concievably might down the line. The bigger issue is the implementation problem... somehow "banning" hooks in situations where there are circulars would be hard as it might not the classes fault, so it could work sometimes and die others. Remember that we might have large structure with quite a diverse range of data and multiple hooked classes. Implementation would be a nightmare, and since hooks are a "wizards only" feature, there's an understanding that you should now how to handle these situations.
Show quoted text
> > So I'm going with 2.
> > NOOOOO!!!!! This is wrong! Who can guarantee that their object > structures are cycle-free? It's better to leave it the way it is!!
That's why we are leaving it the way it is. By not changing the behaviour of STORABLE_thaw, and adding an additional STORABLE_attach method, we can guarentee not to break any existing code. You guarentee the object structure is cycle-free by not allowing ANY references to be stored in the freeze. If you have complex data that you still want to go into the freeze, the recommended way is to re-enter Storable to create the string. Now, if something underneath the hooked object shares a reference with something above the hooked object, they WON'T say shared once you thaw. But it does mean that you can guarentee not to have circulars.
From: natg [...] shore.net
Thanks for the interesting conversation. As you point out, the problem with references to incomplete objects can occur even in the absence of side references. Your STORABLE_attach solution will work quite well if it is called the first time the object is seen. Here is a possibly cleaner way to package the solution. 1. STORABLE_attach would be called by the engine the first time the object is seen. Its job is to make or choose the thawed representation of the object and return it. This object will not, in general, be completely filled in, but STORABLE_attach is expected to ensure that the object is at least valid. 2. The engine would add the returned object to the structure it uses to map frozen references to thawed ones and proceed as normal. 3. Later, when the engine is finished with all side references, it would call STORABLE_thaw as it does now. STORABLE_thaw would be expected to complete the object initialization if any is required. (Probably none in your application, but lots in mine.) A class can leave either method undefined if does not need that capability.
ooooo, much better. The only issue I see with that case, is that we have to make sure that BOTH methods get passed the frozen string, so we might have to store the string away somewhere until it is needed the second time. I've got test cases attached already for "STORABLE_attach AND NOT STORABLE_thaw" case. Can you provide a test script that covers the correct "STORABLE_attach AND STORABLE_thaw" behaviour, so I can double check we are on the same page here, and to give coraline something to code against? The other good thing here is that if needed, we can implement this in two steps. Having an alternative STORABLE_attach as step 1 does not limit upgrading the implementation to support it being an additional STORABLE_attach as a step 2. Since I'm having to pay for the actual XS coding here, and it has been tough enough to justify an external constractor in any case, I'll get coraline to start and get step 1 out the way now, adding an additional path from 3901 to 4189. Doing step 2 then just means adding a third path to handle the "both methods" case. But the XS should be in place from step 1 to largely recycle.
Show quoted text
> path from 3901 to 4189. Doing step 2 then just means adding a third path
That should read "line 3901 to line 4189, in the current version".
From: natg [...] shore.net
Show quoted text
> The only issue I see with that case, is that we have to make > sure that BOTH methods get passed the frozen string, so we > might have to store the string away somewhere until it is > needed the second time.
Good point. Show quoted text
> Can you provide a test script that covers the correct > "STORABLE_attach AND STORABLE_thaw" behaviour, so I can > double check we are on the same page here, and to give > coraline something to code against?
Will do. Tomorrow (Mon Mar 14). Show quoted text
> Since I'm having to pay for the actual XS coding here, and it > has been tough enough to justify an external constractor in > any case...
How much money are we talking about? I may be able to contribute something. Show quoted text
> I'll get coraline to start and get step 1 out the > way now, adding an additional path from 3901 to 4189. Doing > step 2 then just means adding a third path to handle the > "both methods" case. But the XS should be in place from step > 1 to largely recycle.
Sounds like a great plan. Best, Nat
Here is the promised test program for "STORABLE_attach AND STORABLE_thaw". The output is Show quoted text
> perl attach_thaw.t
1..19 # Freezing a # Freezing b # Freezing c # Thawing c # Thawing b # Thawing a ok 1 - structures have same content -- should pass with or without attach not ok 2 - structures are identical -- should only pass when attach is working # Failed test (attach_thaw.t at line 18) # got: 'AttachThaw=HASH(0x827f40c)' # expected: 'AttachThaw=HASH(0x811b154)' ok 3 - oid2obj correct for object a ok 4 - correct freeze count (1) for object a not ok 5 - correct attach count (1) for object a -- should only pass when attach is working # Failed test (attach_thaw.t at line 24) # got: undef # expected: '1' ok 6 - correct thaw count (1) for object a ok 7 - oid2obj correct for object b ok 8 - correct freeze count (1) for object b not ok 9 - correct attach count (1) for object b -- should only pass when attach is working # Failed test (attach_thaw.t at line 24) # got: undef # expected: '1' ok 10 - correct thaw count (1) for object b ok 11 - oid2obj correct for object c ok 12 - correct freeze count (1) for object c not ok 13 - correct attach count (1) for object c -- should only pass when attach is working # Failed test (attach_thaw.t at line 24) # got: undef # expected: '1' ok 14 - correct thaw count (1) for object c # Thawing c # Thawing b # Thawing a ok 15 - structures have same content -- should pass with or without attach ok 16 - structures are not identical -- should pass with or without attach not ok 17 - oid2obj correct for thawed object a -- should only pass when attach is working # Failed test (attach_thaw.t at line 36) # got: undef # expected: 'AttachThaw=HASH(0x82bf444)' not ok 18 - oid2obj correct for thawed object b -- should only pass when attach is working # Failed test (attach_thaw.t at line 36) # got: undef # expected: 'AttachThaw=HASH(0x818fa14)' not ok 19 - oid2obj correct for thawed object c -- should only pass when attach is working # Failed test (attach_thaw.t at line 36) # got: undef # expected: 'AttachThaw=HASH(0x82bf044)' # Looks like you failed 7 tests of 19.
use strict; use Storable (); use Test::More tests => 19; use Test::Deep; my $a=new AttachThaw('a',1); my $b=new AttachThaw('b',2); my $c=new AttachThaw('c',3); # Link objects into a cycle $a->{next}=$b; $b->{next}=$c; $c->{next}=$a; my $string=Storable::freeze($a); # freeze entire structure my $thawed=Storable::thaw($string); # get it back cmp_deeply($thawed,$a,"structures have same content -- should pass with or without attach"); is($thawed,$a,"structures are identical -- should only pass when attach is working"); for my $obj ($a,$b,$c) { my($name,$oid)=@$obj{qw(name oid)}; is($AttachThaw::oid2obj{$oid},$obj,"oid2obj correct for object $name"); is($AttachThaw::oid2freeze{$oid},1,"correct freeze count (1) for object $name"); is($AttachThaw::oid2attach{$oid},1,"correct attach count (1) for object $name -- should only pass when attach is working"); is($AttachThaw::oid2thaw{$oid},1,"correct thaw count (1) for object $name"); } %AttachThaw::oid2obj=undef; # clear memory of objects my $thawed=Storable::thaw($string); # and get structure again cmp_deeply($thawed,$a,"structures have same content -- should pass with or without attach"); isnt($thawed,$a,"structures are not identical -- should pass with or without attach"); my $obj=$thawed; for my $i (1..3) { my($name,$oid)=@$obj{qw(name oid)}; is($AttachThaw::oid2obj{$oid},$obj,"oid2obj correct for thawed object $name -- should only pass when attach is working"); $obj=$obj->{next}; } exit; package AttachThaw; # Note: 'oid' stands for 'object id' our %oid2obj; # Used by attach to recreate shared structure our %oid2freeze; # Number of freezes our %oid2attach; # Number of attaches our %oid2thaw; # Number of thaws sub new { my $class=shift; $class=(ref $class)||$class; my($name,$oid)=@_; my $self=bless {name=>$name,oid=>$oid},$class; $oid2obj{$oid}=$self; # Remember object for attach return $self; } sub STORABLE_freeze { my ($self,$clone)=@_; my($name,$next,$oid)=@$self{qw(name next oid)}; print "# Freezing $name\n"; $oid2freeze{$oid}++; return (join($;,$name,$oid),$next); # Obviously not completely general... } sub STORABLE_attach { my ($class,$string) = @_; my($name,$oid)=split($;,$string); print "# Attaching $name\n"; $oid2attach{$oid}++; my $object=$oid2obj{$oid}; # Get existing object, if any unless ($object) { # Object does not yet exist $object=bless {},$class; # so make new empty one # Note that content filled in by thaw $oid2obj{$oid}=$object; # Remember object for next time } return $object; } sub STORABLE_thaw { my ($self, $clone, $string, @refs) = @_; my($name,$oid)=split($;,$string); print "# Thawing $name\n"; $oid2thaw{$oid}++; my $next=$refs[0]; # @$self{qw(name next oid)}=($name,$next,$oid); return; } 1;
See attached draft patch for step 1 against Storable 2.13. Unfortunately, some of the tests were returning false positives, so I've modified them slightly to test more explicitly. So parts of attach_singleton.t will fail tests.

Message body is not shown because it is too large.

Find attached second version of step 1 patch. I'll be checking this shortly to ensure it works as expected. Once I'm comfortable you should be able to merge and release. If anyone else wants to take a look at is as well, feel free to do so.

Message body is not shown because it is too large.

Show quoted text
> I'll be checking this shortly to ensure it works as expected. Once I'm > comfortable you should be able to merge and release.
Patch looks good to me. I'm happy for merge and 2.14 release for this patch. I think the only think it lacks is the actual 2.14 version increment and Changes file entry?
Moving on to step 2, Simon Cozens just emailed the following. Show quoted text
> Hi, > Something seems wrong with the test you sent for step 2; > I have attach and thaw both being called as expected, but > I don't see how the test is expected to work. If, as I > understand it, each object should have attach and then > thaw called on it in turn, you're going to end up with > the object "C"'s next point not correctly pointing to > anything because object "A" is created from scratch. > Could you have a look over it and see if you think > the test is sound? > Also, we still didn't agree a price for this one. > > Simon
Nat?
From: natg [...] shore.net
Hi Simon I don’t think I understand your question. Is it an issue with the first part of the test (lines 15-27) or the second part (lines 28-38)? Here’s how I think the attach/thaw capability is supposed to work 1. attach is called the first time the object is seen during the de- serialization. Its job is to define the mapping from Storable’s internal reference ids (sorry, I don’t what these are called) to actual Perl references that exist in the client program's memory. In the test program, the Perl references are found in %oid2obj or created from scratch if the object does not yet exist. 2. thaw is called after all side references have been de-serialized. Its job is to fill in the content of the object based on the serialization string and the side references. The client program can define or omit either method. If no attach method is provided, Storable works exactly as it does now (of course – backwards compatibility!!). If there’s an attach but no thaw, the client has to do all the work in attach; this makes sense if there are no side references. If I missed the point, please ask again… I’d also be happy to discuss by phone. Thanks, Nat [ADAMK - Thu Mar 31 21:55:26 2005]: Show quoted text
> Moving on to step 2, Simon Cozens just emailed the following. >
> > Hi, > > Something seems wrong with the test you sent for step 2; > > I have attach and thaw both being called as expected, but > > I don't see how the test is expected to work. If, as I > > understand it, each object should have attach and then > > thaw called on it in turn, you're going to end up with > > the object "C"'s next point not correctly pointing to > > anything because object "A" is created from scratch. > > Could you have a look over it and see if you think > > the test is sound? > > Also, we still didn't agree a price for this one. > > > > Simon
> > Nat?