Bug #58435 for List-Uniq: modifying array being iterated

Tue Jun 15 21:36:12 2010 user42 [...] zip.com.au - Ticket created

Subject:	modifying array being iterated
Date:	Wed, 16 Jun 2010 11:35:33 +1000
To:	bug-List-Uniq [...] rt.cpan.org
From:	Kevin Ryde <user42 [...] zip.com.au>

perlsyn "Foreach Loops" claims foreach can get "very confused" if you modify an array using splice while iterating, the way List::Uniq does. Nosing around the sources suggests it's actually fine to modify in an as-yet uniterated portion, but if the docs are taken at face value it might be worth changing "foreach(@_) ..." to instead use the $i index position or maybe a map{}.

Wed Jun 16 08:41:28 2010 JFITZ [...] cpan.org - Taken

Wed Jun 16 08:42:41 2010 JFITZ [...] cpan.org - Correspondence added

On Tue Jun 15 21:36:12 2010, user42@zip.com.au wrote: Show quoted text

> perlsyn "Foreach Loops" claims foreach can get "very confused" if you > modify an array using splice while iterating, the way List::Uniq does. > > Nosing around the sources suggests it's actually fine to modify in an > as-yet uniterated portion, but if the docs are taken at face value it > might be worth changing "foreach(@_) ..." to instead use the $i index > position or maybe a map{}.

Indeed, this is what was broken in 0.12. When we spliced in an empty arrayref, the size of the array shrinks but the iterator retains its old value. We then skip over the item that followed the empty arrayref in the original list. I've benchmarked an index and map alternative, and compared their speed to the 0.12 and 0.13 implementations: Rate l-u-0-13-recur index l-u-0-12-recur map l-u-0-13- orig l-u-0-12-orig l-u-0-13-recur 47.6/s -- -4% -16% -35% -51% -62% index 49.3/s 4% -- -13% -33% -49% -61% l-u-0-12-recur 56.4/s 19% 14% -- -23% -41% -55% map 73.2/s 54% 48% 30% -- -24% -42% l-u-0-13-orig 96.4/s 102% 95% 71% 32% -- -24% l-u-0-12-orig 126/s 165% 156% 124% 72% 31% -- The difference between the -orig and -recur variants is that the -recur ones handle lists with arrayrefs more than one level deep: ( 'foo', [ [ 'bar' ] ], [ [ [ 'baz' ] ] ] ) The map implementation lends itself to recursively unwrapping, and once that was in it didn't make sense to benchmark the other implementations without the same logic. So, the change from 0.12 to 0.13 was a 24% performance hit. The map implementation, when compared fairly against 0.13 is more performant but still slower than the published 0.13 release. I'm also concerned that adding the recursion will break existing code, even though I can't imagine a use case in which you want to uniq something containing the stringified representations of listrefs. I've got a 0.14 release ready to go, but I welcome thoughts on the above concerns before pushing it to PAUSE.

Wed Jun 16 08:42:42 2010 The RT System itself - Status changed from 'new' to 'open'

Thu Jun 17 19:07:29 2010 user42 [...] zip.com.au - Correspondence added

Subject:	Re: [rt.cpan.org #58435] modifying array being iterated
Date:	Fri, 18 Jun 2010 09:07:07 +1000
To:	bug-List-Uniq [...] rt.cpan.org
From:	Kevin Ryde <user42 [...] zip.com.au>

"James FitzGibbon via RT" <bug-List-Uniq@rt.cpan.org> writes: Show quoted text

> > ( 'foo', [ [ 'bar' ] ], [ [ [ 'baz' ] ] ] )

I noticed the current code is "sometimes recursive" :-) use List::Uniq 'uniq'; print uniq('foo',[['bar'],['quux']]); => fooARRAY(0x9572818)quux but could tell if that was meant to be supported or not :). Show quoted text

> I'm also concerned that adding the recursion will break existing code, > even though I can't imagine a use case in which you want to uniq > something containing the stringified representations of listrefs.

If uniqueness is by stringizing then it doesn't make much sense to have arrayrefs. And if flattening is done at the toplevel now then you'd be inclined to do it recursively on the same basis. The only thing against flattening might be objects with overloaded stringizing which could be uniqued without wanting to recurse into them. But could have a non-flattening version or option later if anyone wanted that.

Fri Jun 18 11:28:43 2010 JFITZ [...] cpan.org - Correspondence added

On Thu Jun 17 19:07:29 2010, user42@zip.com.au wrote: Show quoted text

> I noticed the current code is "sometimes recursive" :-) > > use List::Uniq 'uniq'; > print uniq('foo',[['bar'],['quux']]); > > => fooARRAY(0x9572818)quux > > but could tell if that was meant to be supported or not :). >

> > I'm also concerned that adding the recursion will break existing

code, Show quoted text

> > even though I can't imagine a use case in which you want to uniq > > something containing the stringified representations of listrefs.

> > If uniqueness is by stringizing then it doesn't make much sense to

have Show quoted text

> arrayrefs. And if flattening is done at the toplevel now then you'd

be Show quoted text

> inclined to do it recursively on the same basis. > > The only thing against flattening might be objects with overloaded > stringizing which could be uniqued without wanting to recurse into

them. Show quoted text

> But could have a non-flattening version or option later if anyone

wanted Show quoted text

> that.

The original design was to flatten to one level, but obviously that broke on empty array refs. In re-familiarizing myself with the code, I couldn't see any reason why the flattening shouldn't be recursive. I've now made the new recursive flattening the default, with the ability to turn it off using the options hash. 0.20 is now on its way to CPAN via PAUSE.

Fri Jun 18 11:28:44 2010 JFITZ [...] cpan.org - Status changed from 'open' to 'resolved'