Skip Menu |

This queue is for tickets about the List-MoreUtils-XS CPAN distribution.

Report information
The Basics
Id: 123989
Status: resolved
Priority: 0/
Queue: List-MoreUtils-XS

People
Owner: Nobody in particular
Requestors: zefram [...] fysh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: more $a/$b/$_ refcounting bugs
Date: Mon, 1 Jan 2018 17:33:53 +0000
To: bug-List-MoreUtils-XS [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
List-MoreUtils-XS still has refcounting bugs relating to the GvSV slots of $a, $b, and $_. For example, pairwise() screws up its localisation of $a and $b thus: $ perl -MDevel::Peek=Dump -MList::MoreUtils::XS=pairwise -lwe 'pairwise { } @{["0"]}, @{["0"]}; Dump $a; Dump $b' SV = UNKNOWN(0xff) (0xfecab0) at 0xfc9128 REFCNT = 0 FLAGS = () SV = UNKNOWN(0xff) (0xfecb88) at 0xfecac8 REFCNT = 0 FLAGS = () and both pairwise() and qsort() screw up refcounts for $a and $b thus: $ perl -MList::MoreUtils::XS=pairwise -e 'pairwise { *a = \5 } @{[2]}, @{[3]}' Attempt to free unreferenced scalar: SV 0x189fae8, Perl interpreter: 0x1879010. $ perl -MList::MoreUtils::XS=qsort -e 'qsort { *a = \5; 0 } @{[2,3]}' Attempt to free unreferenced scalar: SV 0x1d28ab8, Perl interpreter: 0x1d02010. As I described in the former ticket [rt.cpan.org #123868], the API is that the GvSV slot of a glob is refcounted. So when assigning to GvSV, you need to increment the refcount of the new value first, and then decrement the refcount of the old value afterwards. When you want to localise GvSV around a function, the way pp_sort does, the SAVESPTR() that you're currently using is the wrong way to do it. You need to SAVEGENERICSV() and then increment the reference count. See core commit 16ada235c332e017667585e1a5a00ce43a31c529 for an example of this type of bugfix. You've said something about wanting API documentation. I'm not clear what you're after that's not covered by the above paragraph. -zefram
On Mon Jan 01 12:34:05 2018, zefram@fysh.org wrote: Show quoted text
> List-MoreUtils-XS still has refcounting bugs relating to the GvSV > slots > of $a, $b, and $_. For example, pairwise() screws up its localisation > of $a and $b thus: > > $ perl -MDevel::Peek=Dump -MList::MoreUtils::XS=pairwise -lwe > 'pairwise { } @{["0"]}, @{["0"]}; Dump $a; Dump $b' > SV = UNKNOWN(0xff) (0xfecab0) at 0xfc9128 > REFCNT = 0 > FLAGS = () > SV = UNKNOWN(0xff) (0xfecb88) at 0xfecac8 > REFCNT = 0 > FLAGS = () > > and both pairwise() and qsort() screw up refcounts for $a and $b thus: > > $ perl -MList::MoreUtils::XS=pairwise -e 'pairwise { *a = \5 } @{[2]}, > @{[3]}' > Attempt to free unreferenced scalar: SV 0x189fae8, Perl interpreter: > 0x1879010. > $ perl -MList::MoreUtils::XS=qsort -e 'qsort { *a = \5; 0 } @{[2,3]}' > Attempt to free unreferenced scalar: SV 0x1d28ab8, Perl interpreter: > 0x1d02010. > > As I described in the former ticket [rt.cpan.org #123868], the API is > that the GvSV slot of a glob is refcounted. So when assigning to > GvSV, > you need to increment the refcount of the new value first, and then > decrement the refcount of the old value afterwards. When you want to > localise GvSV around a function, the way pp_sort does, the SAVESPTR() > that you're currently using is the wrong way to do it. You need to > SAVEGENERICSV() and then increment the reference count. See core > commit > 16ada235c332e017667585e1a5a00ce43a31c529 for an example of this type > of bugfix. > > You've said something about wanting API documentation. I'm not clear > what you're after that's not covered by the above paragraph.
As asked in https://rt.cpan.org/Ticket/Display.html?id=123868#txn-1762063 Show quoted text
> For the records, I did not take the code from pp_sort - I take it from pairwise which > taked it from ListUtil.xs ages ago. The reason for that kind of code stealing is the > lack of proper documentation wrt. localizing $a/$b/$_ from XS - all I found was > some poor multicall documentation which simply says: look at List::Util - that's > the reference.
When you take a look at the fix especially for reported issue in reduce_* at https://github.com/perl5-utils/List-MoreUtils-XS/blob/master/XS.xs#L396-L417, you'll detect that: 1) PL_defgv is "saved" by using SAVESPTR(GvSV()) once, but 2) PL_firstg/PL_secondgv is "saved" multiple times using SAVEGENERICSV() twice (around save_gp plus GvINTRO_off followed by a SvREFCNT_inc) I'd like to understand from a documentation - not by studying the perl core: a) What's the difference between SAVESPTR(GvSV()) and SAVEGENERICSV()? b) Why does PL_firstg/PL_secondgv need to be saved twice while PL_defgv doesn't? c) What does SAVESPTR() do? d) What does SAVEGENERICSV() do? e) Why does PL_firstg/PL_secondgv need a new muteable GV in opposite to PL_defgv? f) Why is the gv_fetchpvs/SAVESPTR/SAVEGENERICSV/... missing in perlapi? I digged through perlxs,perlxstut, perlapi, perlcall and "Extending and Embedding Perl" and all I found was an explanation about globs being stored in a table - but the book is from 2003 and maybe not completely up to date ;) Anyway - for a complete code study and bug fix, I need to understand (1) and (2) as well as (a)-(e), (f) is a bonus. Best regards, Jens
Subject: Re: [rt.cpan.org #123989] more $a/$b/$_ refcounting bugs
Date: Tue, 2 Jan 2018 14:05:18 +0000
To: Jens Rehsack via RT <bug-List-MoreUtils-XS [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
Jens Rehsack via RT wrote: Show quoted text
>I'd like to understand from a documentation - not by studying the perl core:
If you go strictly by the API documentation, then GvSV() isn't documented as an lvalue. By assigning to it at all you're going beyond the public API, and if you're doing that you have to be willing to use the core's standards of documentation. But by all means open core tickets about these issues: about whether lvalue GvSV() should be in the public API, and about underdocumented API elements. Show quoted text
>a) What's the difference between SAVESPTR(GvSV()) and SAVEGENERICSV()?
There are two orthogonal differences here. The difference between SAVESPTR() and SAVEGENERICSV() is in refcounting. SAVESPTR() is for uncounted refs and SAVEGENERICSV() for counted refs. Details below on (c) and (d). The difference between SAVE*(GvSV()) and SAVE*() is that the localisation is being applied to a different thing. PL_firstgv is a variable that holds a reference to a glob; GvSV(PL_firstgv) is the slot within that glob that holds a reference to a scalar. Show quoted text
>b) Why does PL_firstg/PL_secondgv need to be saved twice while PL_defgv doesn't?
I think by "saved twice" you're referring to the saving of both PL_firstgv and GvSV(PL_firstgv). These are saving two distinct things. It is necessary to save, that is, to localise, whichever global things one is going to write to. It's necessary to save PL_firstgv because it then gets assigned to, and it's necessary to save GvSV(PL_firstgv) because that gets assigned to as well. It's also necessary to save GvSV(PL_defgv) because it'll be assigned to, but it is not necessary to save PL_defgv because it's not assigned to. See also (e) below. (Everything said for PL_firstgv also goes for PL_secondgv.) Show quoted text
>c) What does SAVESPTR() do?
SAVESPTR() saves a pointer value without touching refcounts, and so is suitable for use on a variable that holds an uncounted reference, or in some specific situations where everything that happens to a counted reference is known. It saves the current value of the argument variable, and upon unwinding the variable will be rewritten to contain the saved value. Show quoted text
>d) What does SAVEGENERICSV() do?
SAVEGENERICSV() is for use on a variable that holds a counted reference, and unfortunately its calling convention is a bit wonky. SAVEGENERICSV() not only saves the old scalar pointer value but also creates and holds a new counted reference to that scalar. The caller of SAVEGENERICSV() must then immediately write the correct scalar pointer into the variable (if it is not already correct) and increment its refcount. Upon unwinding, whatever scalar is then referenced by the variable has its refcount decremented, the old scalar pointer is written into the variable, and its refcount is also decremented. Show quoted text
>e) Why does PL_firstg/PL_secondgv need a new muteable GV in opposite to PL_defgv?
It's not necessary to write to PL_defgv because it is reliably initialised to refer to *_. There's only one *_ that matters, because code like "$_" refers to the one in the main package regardless of the locally selected package. PL_defgv is kept permanently pointing at the main *_ for everything to use. On the other hand, it's necessary to write to PL_firstgv because it would not otherwise be reliably set to refer to *a. It's only set locally for a specific operation. Unlike *_, each package has its own *a, and code like "$a" refers to the one in the locally selected package. Things such as pp_sort and pairwise() have to look up the specific *a that the caller will be using. However, you could avoid having to save PL_firstgv by not using PL_firstgv at all. It's not necessary for you to put your *a reference into a global variable. You could put it in a variable that's internal to your code, though you would then have to handle passing your internal variable around yourself. (Everything said for PL_firstgv/*a also goes for PL_secondgv/*b.) Show quoted text
>f) Why is the gv_fetchpvs/SAVESPTR/SAVEGENERICSV/... missing in perlapi?
gv_fetchpvs() has just not been documented, which looks like an oversight. The SAVE*() macros are all undocumented, and it looks like they're not intended to be public API. Their function forms, save_sptr(), save_generic_svref(), et al, are listed as public API but are equally undocumented. It's not clear whether they're really intended as public API. Regrettably, a lot of the older parts of the API are undocumented in that manner. We're better at documenting new API functions, and we do gradually fix the old ones when the occasion arises. Please do open core tickets about missing API documentation that affects you, to provide the occasion. -zefram
@ZEFRAM: Sorry for replying late, $work stuff ... Reading #96343 back and forth and was really confused. I figured out that if(!GvSV(PL_defgv)) { gv_SVadd(PL_defgv); } sv_setsv_mg(GvSV(PL_defgv), args[i]); MULTICALL; seems at least doesn't complain anymore and the leak check seems to be satisfied either (beside the "undef $*" one - that complains that the SV of $_ is leaked now. Any opinion?
Sane parts are resolved, broken wishes are reverted. Fix with next release.