Skip Menu |

This queue is for tickets about the List-MoreUtils-XS CPAN distribution.

Report information
The Basics
Id: 123868
Status: resolved
Priority: 0/
Queue: List-MoreUtils-XS

People
Owner: Nobody in particular
Requestors: zefram [...] fysh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: 0.427_002



Subject: $a/$b/$_ refcounting bugs
Date: Tue, 12 Dec 2017 21:12:39 +0000
To: bug-List-MoreUtils-XS [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
List-MoreUtils-XS has started failing its tests at core commit 16ada235c332e017667585e1a5a00ce43a31c529. This is due to preexisting bugs in L-MU-XS getting noticed due to a core bugfix. The underlying bugs in L-MU-XS are in the treatment of GvSV(PL_{first,second,def}gv), that is, in the aliasing of $a, $b, and $_ that is done by several of its functions. The GvSV slot of a glob is refcounted, but L-MU-XS isn't diddling reference counts. This is actually the same bug that the core pp_sort had that the breaking commit was fixing. L-MU-XS has probably cribbed the logic from the buggy pp_sort. When assigning to GvSV, you need to increment the refcount of the new value first, and then decrement the refcount of the old value afterwards. When you want to localise GvSV around a function, the way pp_sort does, SAVESPTR() is the wrong way to do it. You need to SAVEGENERICSV() and then increment the reference count. See core commit 16ada235c332e017667585e1a5a00ce43a31c529 for an example of this type of bugfix. I think the actual failure mode that's causing test failure arises from this refcounting error, which can be seen on any perl version: $ perl -MDevel::Peek -MList::MoreUtils::XS=reduce_0 -lwe 'print reduce_0 { $a + $b } 2,3,4; Dump $a' 9 SV = UNKNOWN(0xff) (0x2478928) at 0x2455110 REFCNT = 0 FLAGS = () The issue there is that $a was locally set to the partial sum, repeatedly, but each partial sum scalar was mortal. When the reduction was complete, $a was left aliased to a scalar that was now dead. (The bugfixed pp_sort, performing refcounting on the $a that was in place when it was invoked, then barfed on this scalar already being dead.) Properly refcounting GvSV would fix this, but there's another issue too. I think reduce_0() et al should not be side-effecting $a and $b. I think reduction should localise $a and $b, just as some of your other functions do. This breakage is being tracked on the core side as [perl #132578]. -zefram
On Tue Dec 12 16:13:15 2017, zefram@fysh.org wrote: Show quoted text
> List-MoreUtils-XS has started failing its tests at core commit > 16ada235c332e017667585e1a5a00ce43a31c529. This is due to preexisting > bugs in L-MU-XS getting noticed due to a core bugfix. > > The underlying bugs in L-MU-XS are in the treatment of > GvSV(PL_{first,second,def}gv), that is, in the aliasing of $a, $b, and > $_ that is done by several of its functions. The GvSV slot of a glob > is > refcounted, but L-MU-XS isn't diddling reference counts. This is > actually > the same bug that the core pp_sort had that the breaking commit was > fixing. L-MU-XS has probably cribbed the logic from the buggy > pp_sort. > > When assigning to GvSV, you need to increment the refcount of the > new value first, and then decrement the refcount of the old value > afterwards. When you want to localise GvSV around a function, the > way pp_sort does, SAVESPTR() is the wrong way to do it. You need to > SAVEGENERICSV() and then increment the reference count. See core > commit > 16ada235c332e017667585e1a5a00ce43a31c529 for an example of this type > of bugfix. > > I think the actual failure mode that's causing test failure arises > from > this refcounting error, which can be seen on any perl version: > > $ perl -MDevel::Peek -MList::MoreUtils::XS=reduce_0 -lwe 'print > reduce_0 { $a + $b } 2,3,4; Dump $a' > 9 > SV = UNKNOWN(0xff) (0x2478928) at 0x2455110 > REFCNT = 0 > FLAGS = () > > The issue there is that $a was locally set to the partial sum, > repeatedly, > but each partial sum scalar was mortal. When the reduction was > complete, > $a was left aliased to a scalar that was now dead. (The bugfixed > pp_sort, > performing refcounting on the $a that was in place when it was > invoked, > then barfed on this scalar already being dead.) Properly refcounting > GvSV would fix this, but there's another issue too. I think > reduce_0() > et al should not be side-effecting $a and $b. I think reduction > should > localise $a and $b, just as some of your other functions do. > > This breakage is being tracked on the core side as [perl #132578]. > > -zefram
Without the intension to disagree - cause I see the missing refcnt() and believe this is a long undiscovered bug, the only segfault I can reproduce is in: (lldb) target create "/Users/sno/perl5/perlbrew/perls/perl-bisect/bin/perl" Current executable set to '/Users/sno/perl5/perlbrew/perls/perl-bisect/bin/perl' (x86_64). (lldb) settings set -- target.run-args "-Mblib" "t/xs/occurrences.t" (lldb) process launch Process 64124 launched: '/Users/sno/perl5/perlbrew/perls/perl-bisect/bin/perl' (x86_64) ok 1 - Each word is counted ok 2 - Text to long, each word is there at least twice ok 3 - 11 comma ok 4 - 7 dots ok 5 - 9 words 'et' ok 6 - Words are as many as requested at www.loremipsum.de ok 7 - probes untouched ok 8 - occurrences of integer probes Process 64124 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xff00000008) frame #0: 0x00000001000bd176 perl`S_sv_unmagicext_flags + 102 perl`S_sv_unmagicext_flags: -> 0x1000bd176 <+102>: movq 0x8(%rbx), %rax 0x1000bd17a <+106>: movsbl 0x12(%rbx), %esi 0x1000bd17e <+110>: cmpq %r12, %rax 0x1000bd181 <+113>: sete %dl Target 0: (perl) stopped. (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0xff00000008) * frame #0: 0x00000001000bd176 perl`S_sv_unmagicext_flags + 102 frame #1: 0x00000001000be067 perl`Perl_sv_clear + 263 frame #2: 0x00000001000bee1e perl`Perl_sv_free2 + 94 frame #3: 0x00000001000df425 perl`Perl_leave_scope + 357 frame #4: 0x000000010014c86d perl`Perl_pp_sort + 3181 frame #5: 0x00000001002f8826 LeakTrace.bundle`leaktrace_runops at LeakTrace.xs:184 [opt] frame #6: 0x0000000100028f7c perl`perl_run + 876 frame #7: 0x0000000100000d15 perl`main + 149 frame #8: 0x00007fffded32235 libdyld.dylib`start + 1 Which is caused by following perl code: { my @probes = ((1) x 3, undef, (2) x 4, undef, (3) x 2, undef, (4) x 7, undef, (5) x 2, undef, (6) x 4); my $fp = freeze(\@probes); my @o = map { ref $_ ? [sort { (defined $a <=> defined $b) or $a <=> $b } @$_] : $_ } occurrences @probes; } And - to be honest - I can't see any failing refcnt in occurrences (...) Best regards, Jens
As requested - I adopted the referred core fix, but I'd like to get much more information and feedback to develop a proper fix. For the records, I did not take the code from pp_sort - I take it from pairwise which taked it from ListUtil.xs ages ago. The reason for that kind of code stealing is the lack of proper documentation wrt. localizing $a/$b/$_ from XS - all I found was some poor multicall documentation which simply says: look at List::Util - that's the reference. So I copied from code behaving like that: $ perl -MDevel::Peek -MList::Util=reduce -lwe 'print reduce { $a + $b } 2,3,4; Dump $a' 9 SV = NULL(0x0) at 0x7fae83035198 REFCNT = 1 FLAGS = () $ perl -MDevel::Peek -MList::Util=reduce -lwe 'print reduce { $a + $b } 2,3,4; Dump $b' 9 SV = NULL(0x0) at 0x7ff98106bb58 REFCNT = 1 FLAGS = () With the adopted code from pp_sort.c, List::MoreUtils::XS behaves like that: $ make && perl -MDevel::Peek -MList::MoreUtils::XS=reduce_0 -lwe 'print reduce_0 { $a + $b } 2,3,4; Dump $a' 9 SV = UNKNOWN(0xff) (0x7fad2282f7a0) at 0x7fad22803478 REFCNT = 0 FLAGS = () $ make && perl -MDevel::Peek -MList::MoreUtils::XS=reduce_0 -lwe 'print reduce_0 { $a + $b } 2,3,4; Dump $b' 9 SV = IV(0x7f9bdf831c20) at 0x7f9bdf831c30 REFCNT = 1 FLAGS = (IOK,READONLY,PROTECT,pIOK) IV = 4 So I think, the fix is maybe wrong. And I'd like to understand why $a/$b and $_ (PL_devgv) behave such different. Please have a look into https://github.com/perl5-utils/List-MoreUtils-XS/commit/b28fd8864f51bd3d9a55319afb990fa4cdf87288 and give me some feedback. Thanks in advance. Best regards, Jens
On Wed Dec 13 09:42:27 2017, REHSACK wrote: Show quoted text
> So I think, the fix is maybe wrong. And I'd like to understand why > $a/$b and $_ (PL_devgv) behave such different.
They don’t behave differently. List::Util has two bugs open related to this: https://rt.cpan.org/Ticket/Display.html?id=96343 https://rt.cpan.org/Ticket/Display.html?id=118334
On Wed Dec 13 12:17:14 2017, SPROUT wrote: Show quoted text
> On Wed Dec 13 09:42:27 2017, REHSACK wrote:
> > So I think, the fix is maybe wrong. And I'd like to understand why > > $a/$b and $_ (PL_devgv) behave such different.
> > They don’t behave differently. List::Util has two bugs open related to this: > > https://rt.cpan.org/Ticket/Display.html?id=96343 > https://rt.cpan.org/Ticket/Display.html?id=118334
In that case, I bump the review / documentation request :) I need to understand the issue a bit better and a proper documentation how to localize a var like "$a" and protect it against leaks on exceptions without wasting resources by encompassing everything with an G_EVAL.
Subject: Re: [rt.cpan.org #123868] $a/$b/$_ refcounting bugs
Date: Wed, 13 Dec 2017 19:23:49 +0000
To: Jens Rehsack via RT <bug-List-MoreUtils-XS [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
Jens Rehsack via RT wrote: Show quoted text
> the only segfault I can >reproduce is in:
Use a debugging build of perl. There are useful assertions that will be hit before actual segvs. Show quoted text
>And - to be honest - I can't see any failing refcnt in occurrences (...)
The refcount fail is in reduce_0(), called earlier in the test script. -zefram
On Wed Dec 13 14:24:07 2017, zefram@fysh.org wrote: Show quoted text
> Jens Rehsack via RT wrote:
> > the only segfault I can > >reproduce is in:
> > Use a debugging build of perl. There are useful assertions that will > be hit before actual segvs.
$ perl -V Summary of my perl5 (revision 5 version 27 subversion 7) configuration: Commit id: 5eabe374afae7b84aa4ba5e13e5424865bc74fc1 Platform: osname=darwin osvers=16.7.0 archname=darwin-2level uname='darwin ernie.local 16.7.0 darwin kernel version 16.7.0: mon nov 13 21:56:25 pst 2017; root:xnu-3789.72.11~1release_x86_64 x86_64 ' config_args='-de -Dcc=clang -Doptimize=-O -g -DDEBUGGING -Dusedevel -Dprefix=/Users/sno/perl5/perlbrew/perls/perl-bisect' ... Characteristics of this binary (from libperl): Compile-time options: DEBUGGING HAS_TIMES PERLIO_LAYERS PERL_COPY_ON_WRITE PERL_DONT_CREATE_GVSV PERL_MALLOC_WRAP PERL_OP_PARENT PERL_PRESERVE_IVUV PERL_USE_DEVEL PERL_USE_SAFE_PUTENV USE_64_BIT_ALL USE_64_BIT_INT USE_LARGE_FILES USE_LOCALE USE_LOCALE_COLLATE USE_LOCALE_CTYPE USE_LOCALE_NUMERIC USE_LOCALE_TIME USE_PERLIO USE_PERL_ATOF Built under darwin Compiled at Dec 13 2017 12:30:17 %ENV: PERLBREW_BASHRC_VERSION="0.75" PERLBREW_HOME="/Users/sno/.perlbrew" PERLBREW_MANPATH="/Users/sno/perl5/perlbrew/perls/perl-bisect/man" PERLBREW_PATH="/Users/sno/perl5/perlbrew/bin:/Users/sno/perl5/perlbrew/perls/perl-bisect/bin" PERLBREW_PERL="perl-bisect" PERLBREW_ROOT="/Users/sno/perl5/perlbrew" PERLBREW_VERSION="0.75" @INC: /Users/sno/perl5/perlbrew/perls/perl-bisect/lib/site_perl/5.27.7/darwin-2level /Users/sno/perl5/perlbrew/perls/perl-bisect/lib/site_perl/5.27.7 /Users/sno/perl5/perlbrew/perls/perl-bisect/lib/5.27.7/darwin-2level /Users/sno/perl5/perlbrew/perls/perl-bisect/lib/5.27.7 I do :) Show quoted text
> >And - to be honest - I can't see any failing refcnt in occurrences (...)
> > The refcount fail is in reduce_0(), called earlier in the test script.
I expected that reduce_*.t will fail, too. Or report a leak. Or something else. They won't. This report is found because of a side effect. The reported refcount has been fixed. But what's with all the others waiting and sleeping? Cheers, Jens
Subject: Re: [rt.cpan.org #123868] $a/$b/$_ refcounting bugs
Date: Mon, 1 Jan 2018 15:25:04 +0000
To: Bugs in List-MoreUtils-XS via RT <bug-List-MoreUtils-XS [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
List-MoreUtils-XS-0.428 has some changes in response to this ticket, but hasn't fixed all the GvSV refcounting bugs. This bug in pairwise() has been found to break Algorithm-QuineMcCluskey on core versions that have the pp_sort bugfix: $ perl -MDevel::Peek=Dump -MList::MoreUtils::XS=pairwise -lwe 'pairwise { } @{["0"]}, @{["0"]}; Dump $a; Dump $b' SV = UNKNOWN(0xff) (0xfecab0) at 0xfc9128 REFCNT = 0 FLAGS = () SV = UNKNOWN(0xff) (0xfecb88) at 0xfecac8 REFCNT = 0 FLAGS = () The breakage of Algorithm-QuineMcCluskey is being tracked on the core side as [perl #132671]. -zefram
On Mon Jan 01 10:25:45 2018, zefram@fysh.org wrote: Show quoted text
> List-MoreUtils-XS-0.428 has some changes in response to this ticket, > but hasn't fixed all the GvSV refcounting bugs.
I asked you to review the changes or document the API, respectively. This doesn't happen - so the symptom is fixed. Show quoted text
> This bug in > pairwise() > has been found to break Algorithm-QuineMcCluskey on core versions that > have the pp_sort bugfix: > > $ perl -MDevel::Peek=Dump -MList::MoreUtils::XS=pairwise -lwe > 'pairwise { } @{["0"]}, @{["0"]}; Dump $a; Dump $b' > SV = UNKNOWN(0xff) (0xfecab0) at 0xfc9128 > REFCNT = 0 > FLAGS = () > SV = UNKNOWN(0xff) (0xfecb88) at 0xfecac8 > REFCNT = 0 > FLAGS = () > > The breakage of Algorithm-QuineMcCluskey is being tracked on the core > side as [perl #132671].
New symptom, new ticket. Or you decide to provide requested API documentation.
On 2018-01-01 11:41:05, REHSACK wrote: Show quoted text
> On Mon Jan 01 10:25:45 2018, zefram@fysh.org wrote:
> > List-MoreUtils-XS-0.428 has some changes in response to this ticket, > > but hasn't fixed all the GvSV refcounting bugs.
> > I asked you to review the changes or document the API, respectively. > This doesn't happen - so the symptom is fixed. >
> > This bug in > > pairwise() > > has been found to break Algorithm-QuineMcCluskey on core versions that > > have the pp_sort bugfix: > > > > $ perl -MDevel::Peek=Dump -MList::MoreUtils::XS=pairwise -lwe > > 'pairwise { } @{["0"]}, @{["0"]}; Dump $a; Dump $b' > > SV = UNKNOWN(0xff) (0xfecab0) at 0xfc9128 > > REFCNT = 0 > > FLAGS = () > > SV = UNKNOWN(0xff) (0xfecb88) at 0xfecac8 > > REFCNT = 0 > > FLAGS = () > > > > The breakage of Algorithm-QuineMcCluskey is being tracked on the core > > side as [perl #132671].
> > New symptom, new ticket. Or you decide to provide requested API > documentation.
It seems that the new ticket is https://rt.cpan.org/Ticket/Display.html?id=123989