Skip Menu |

This queue is for tickets about the Set-Object CPAN distribution.

Report information
The Basics
Id: 90070
Status: open
Priority: 0/
Queue: Set-Object

People
Owner: RURBAN [...] cpan.org
Requestors: ftobin [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 1.34



Subject: convenience method @$ resorts -- extremely inefficient should be discouraged or warned about
The usage of @$set to get the members of a set is extremely slow compared to $set->members(). It appears each time @$ is called, promote() causes a sort to occur. I'm not sure of the purpose for this, but it causes extreme inefficiency, but it ends up being hundreds to thousands of times slower than just $set->members(), depending on the number of elements (even with just a couple hundred elements). If the current sorting behavior for @$ needs to be maintained, I recommend being very clear in the documentation about the efficiency of it compared to $set->members(). As it stands, the documentation, near the top, says that @$ is an alias for $set->members(), which is clearly not true. #!/usr/bin/perl use strict; use warnings; use Benchmark; use Set::Object; my @list = (1..100); my $set = Set::Object->new(@list); Benchmark::cmpthese(1000, { '@$set' => sub { my $sum = 0; foreach (@$set) { $sum += $_ }}, '$set->members()' => sub { my $sum = 0; foreach ($set->members()) { $sum += $_ }}}); Rate @$set $set->members() @$set 149/s -- -100% $set->members() 33333/s 22267% --
This issue had been silently patched somewhere between 1.28 and current release. Please consider reflecting that explicitly in changelog, as well as encouraging users to upgrade, because it was actually much worse than this. The described behaviour was sorting the set everytime object was accessed, so taking previous example and repeating a few times: foreach (@$set) { $sum += $_; $sum += $_; $sum += $_; }}, would sort in each(!) of these lines where $_ is used... In our case, on Set::Object@1.28 using $set->members() instead of @{$set} has proven to be over four thousand times faster. This small change reduced online time from 1135s to 0.246s. Once again, please consider advising your users to upgrade, as there was little reason to upgrade from supposedly stable version which seemed to work just great, until we hit this edge case where the bug reared its ugly head. Also this issue should be closed, as the problem is non-existent on current (1.35) version of Set::Object.
bah, wrong fixed in, the one it (probably) is fixed is 1.33 (which is missing from the selector)
On Fri Nov 10 07:57:37 2017, STARLIGHT wrote: Show quoted text
> bah, wrong fixed in, the one it (probably) is fixed is 1.33 (which is > missing from the selector)
yes, 1.33 fixed the pod to mention this improvement. the actual fix was with 1.30. no problem. -- Reini Urban
Show quoted text
> yes, 1.33 fixed the pod to mention this improvement.
I am very sorry for reopening this but I was too quick to actually consider this fixed. I upgraded to the newest version (1.35) to check if it fixed the issue (and I'm not talking about fixing pod). And I assumed it fixed things because I had a bug in my own testing code prepared especially for pasting in this report. Talk about overzealous... Anyhow. The main point is the dreaded behaviour of sneakily sorting *everytime* an object *returned* from @{$set} is *ACCESSED* remains in the 1.35 still. So its not about the added cost of sort, its the repeated cost of calling promote that hurts. And the time *doubles* everytime such item from @{$set} is used. I've prepared an NYTProf analysis of a simple @{$set} case (set_object_bug.pl) and attached it to the report, but the gist of it can be seen here: for (1..100) { my $x = 0; foreach my $item ($lil_set->members()) { # spent 7.24ms making 100 calls to Set::Object::members, avg 72µs/call $x += $item; $x += $item; $x += $item; $x += $item; $x += $item; } foreach my $item (@{$lil_set}) { # spent 2.76ms making 100 calls to Set::Object::__ANON__[Set/Object.pm:750], avg 28µs/call # spent 312µs making 100 calls to Set::Object::TieArray::FETCHSIZE, avg 3µs/call $x += $item; # spent 54.1s making 100000 calls to Set::Object::TieArray::FETCH, avg 541µs/call $x += $item; # spent 54.0s making 100000 calls to Set::Object::TieArray::FETCH, avg 540µs/call $x += $item; # spent 54.0s making 100000 calls to Set::Object::TieArray::FETCH, avg 540µs/call $x += $item; # spent 54.1s making 100000 calls to Set::Object::TieArray::FETCH, avg 541µs/call $x += $item; # spent 54.1s making 100000 calls to Set::Object::TieArray::FETCH, avg 541µs/call # spent 380ms making 100000 calls to Set::Object::TieArray::FETCHSIZE, avg 4µs/call } Please note the number of calls it is causing. The severity is IMHO high here - it will cause "shadow" performance leaks when using the convenience form. Everytime the item is accessed (FETCH), it will call promote() which in turn is sorting the set, leading to the cases like we recently had, where the total time spent was reduced from 1136s to 0.236s (!) after changing from @{$some_set} to $some_set->members(). So once again, the problem is not the fact that @{$set} is sorting, the problem is it is sorting everytime its items are accessed in foreach or similar. Last but not least, let's see what actually comes out of @{$set} as opposed to $set->members: $ cat peek.pl #!/bin/env perl use strict; use warnings; use 5.010; use Set::Object 1.35 qw/set/; use Devel::Peek; my $set = set(qw/a/); say "\$set->members form:"; say Dump($_) foreach $set->members(); say "\n\n\n\@{\$set} form:"; print Dump($_) foreach @{$set}; __END__ $set->members form: SV = PV(0x228eb78) at 0x2290b98 REFCNT = 2 FLAGS = (POK,pPOK) PV = 0x22a9830 "a"\0 CUR = 1 LEN = 8 @{$set} form: SV = PVLV(0x22ef0c0) at 0x23f2910 REFCNT = 2 FLAGS = (GMG,SMG,RMG) IV = 0 NV = 0 PV = 0 MAGIC = 0x22f5040 MG_VIRTUAL = &PL_vtbl_packelem MG_TYPE = PERL_MAGIC_tiedelem(p) MG_FLAGS = 0x02 REFCOUNTED MG_OBJ = 0x243abf0 SV = RV(0x243ac00) at 0x243abf0 REFCNT = 2 FLAGS = (ROK) RV = 0x22cdda8 SV = PVAV(0x2291f40) at 0x22cdda8 REFCNT = 1 FLAGS = (OBJECT) STASH = 0x234f8c0 "Set::Object::TieArray" ARRAY = 0x22d3720 FILL = 1 MAX = 1 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = RV(0x22c8ac8) at 0x22c8ab8 REFCNT = 1 FLAGS = () Elt No. 1 SV = RV(0x22cde30) at 0x22cde20 REFCNT = 1 FLAGS = (ROK,WEAKREF) RV = 0x2290fa0 SV = PVMG(0x22d5890) at 0x2290fa0 REFCNT = 1 FLAGS = (OBJECT,RMG,IOK,OVERLOAD,pIOK) IV = 36500688 NV = 0 PV = 0 MAGIC = 0x243b680 MG_VIRTUAL = &PL_vtbl_backref MG_TYPE = PERL_MAGIC_backref(<) MG_FLAGS = 0x02 REFCOUNTED MG_OBJ = 0x23f27d8 SV = PVAV(0x23f02f0) at 0x23f27d8 REFCNT = 2 FLAGS = () ARRAY = 0x22bb0e0 FILL = 0 MAX = 3 ARYLEN = 0x0 FLAGS = () STASH = 0x22cdc58 "Set::Object" TYPE = t TARGOFF = 0 TARGLEN = 0 TARG = 0x23f2910 We're using PERL 5.10.1. Maybe the problem is not so severe on newer ones, I don't have a testing environment prepared to test that, alas.
Subject: set_object_bug.tbz
Download set_object_bug.tbz
application/octet-stream 250.5k

Message body not shown because it is not plain text.

Show quoted text
> Please note the number of calls it is causing.
One more clarification here, because in current form it may be misleading. When I spoke about number of calls I meant calls to sort, not the actual number of calls. Here's the profiled code attached for clarity (it is also included in tbz containing nytprof).
Subject: set_object_bug.pl
#!/bin/env perl use strict; use warnings; use 5.010; use List::Util qw/shuffle/; use Set::Object 1.35 qw/set/; my @list = shuffle(1..1000); my $lil_set = set(@list); for (1..100) { my $x = 0; foreach my $item ($lil_set->members()) { $x += $item; $x += $item; $x += $item; $x += $item; $x += $item; } foreach my $item (@{$lil_set}) { $x += $item; $x += $item; $x += $item; $x += $item; $x += $item; } } __END__
Subject: Re: [rt.cpan.org #90070] convenience method @$ resorts -- extremely inefficient should be discouraged or warned about
Date: Mon, 13 Nov 2017 13:24:04 +0100
To: bug-Set-Object [...] rt.cpan.org
From: Reini Urban <reini.urban [...] gmail.com>
Can you please provide a doc patch? Am 13.11.2017 13:21 schrieb "Bartłomiej Fulanty via RT" < bug-Set-Object@rt.cpan.org>: Queue: Set-Object Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=90070 > Show quoted text
> Please note the number of calls it is causing.
One more clarification here, because in current form it may be misleading. When I spoke about number of calls I meant calls to sort, not the actual number of calls. Here's the profiled code attached for clarity (it is also included in tbz containing nytprof).
On Mon Nov 13 07:24:21 2017, reini.urban@gmail.com wrote: Show quoted text
> Can you please provide a doc patch? > > Am 13.11.2017 13:21 schrieb "Bartłomiej Fulanty via RT" < > bug-Set-Object@rt.cpan.org>: > > Queue: Set-Object > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=90070 > >
> > Please note the number of calls it is causing.
> > One more clarification here, because in current form it may be misleading. > When I spoke about number of calls I meant calls to sort, not the actual > number of calls. Here's the profiled code attached for clarity (it is also > included in tbz containing nytprof).
I've forked the code on GitHub and will try to provide a pull request. I also looked through module code and want to test an idea for a simple behaviour patch, but nothing solid at the moment.