Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Sereal-Encoder CPAN distribution.

Report information
The Basics
Id: 101876
Status: open
Priority: 0/
Queue: Sereal-Encoder

People
Owner: Nobody in particular
Requestors: zefram [...] fysh.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 10:33:01 +0000
To: bug-Sereal-Encoder [...] rt.cpan.org
From: Zefram <zefram [...] fysh.org>
$ perl -MSereal::Encoder=encode_sereal -MSereal::Decoder=decode_sereal -lwe 'print $]; print $Sereal::Encoder::VERSION; my $a="0 but true"; print decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print decode_sereal(encode_sereal($a));' 5.018002 3.005 0 but true 0 but true 0 I believe the first encoding is representing $a as a string but the second encoding is representing it as a pure integer, based on the IOK flag. In the case of this string, along with infinitely many others such as "00", "01", and "1 ", the integer representation is lossy. It's particularly significant for strings such as "0 but true" and "00" which qualify as true but come out as false when mangled by the lossy encoding. But even when the truth value doesn't change, it is not at all acceptable to lose the string value. The underlying mistake is that you've treated the IOK flag as implying that the scalar is fully characterised by its IV. In general that is not the case. For scalars that are both IOK and POK, to see whether integer representation suffices you need to perform the IV->PV coercion yourself, and see whether the PV generated from the IV matches the scalar's actual PV. Similar remarks apply to NOK and NV. For extra fun, the exact meaning of the [PIN]OK flags varies between Perl versions. -zefram
CC: perlbug [...] perl.org
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 11:50:57 +0100
To: bug-Sereal-Encoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 2 February 2015 at 11:33, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote: Show quoted text
> Mon Feb 02 05:33:20 2015: Request 101876 was acted upon. > Transaction: Ticket created by zefram@fysh.org > Queue: Sereal-Encoder > Subject: losing string value of semi-numeric string > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: zefram@fysh.org > Status: new > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101876 > > > > $ perl -MSereal::Encoder=encode_sereal -MSereal::Decoder=decode_sereal -lwe 'print $]; print $Sereal::Encoder::VERSION; my $a="0 but true"; print decode_sereal(encode_sereal($a)); my $b = $a+0; print $a; print decode_sereal(encode_sereal($a));' > 5.018002 > 3.005 > 0 but true > 0 but true > 0 > > I believe the first encoding is representing $a as a string but the > second encoding is representing it as a pure integer, based on the IOK > flag. In the case of this string, along with infinitely many others > such as "00", "01", and "1 ", the integer representation is lossy. > It's particularly significant for strings such as "0 but true" and "00" > which qualify as true but come out as false when mangled by the lossy > encoding. But even when the truth value doesn't change, it is not at > all acceptable to lose the string value. > > The underlying mistake is that you've treated the IOK flag as implying > that the scalar is fully characterised by its IV. In general that is > not the case. For scalars that are both IOK and POK, to see whether > integer representation suffices you need to perform the IV->PV coercion > yourself, and see whether the PV generated from the IV matches the > scalar's actual PV. Similar remarks apply to NOK and NV. For extra fun, > the exact meaning of the [PIN]OK flags varies between Perl versions.
No. I disagree. This is a bug in perl itself. $ perl -MDevel::Peek -le'my $x="0 but true"; my $y=0+$x; Dump($x)' SV = PVIV(0x7cdd88) at 0x7d9a48 REFCNT = 1 FLAGS = (PADMY,IOK,POK,pIOK,pPOK) IV = 0 PV = 0x7d2b90 "0 but true"\0 CUR = 10 LEN = 16 The IOK flag should NOT be set here, it should be pIOK only. IOK means that the integer representation is either a) canonical, or b) a faithful representation of the PV. pIOK is supposed to mean that the cached value of the string can be used, but that it is not a faithful representation of the string it was derived from. (If IOK and pIOK do not mean these things then it is a total waste to have both set of flags, which seems an unreasonable interpretation.) Compare to this: $ perl -MDevel::Peek -le'my $x="0blahblah"; my $y=0+$x; Dump($x)' SV = PVNV(0x1bcaf10) at 0x1beaa58 REFCNT = 1 FLAGS = (PADMY,POK,pIOK,pNOK,pPOK) IV = 0 NV = 0 PV = 0x1be3ba0 "0blahblah"\0 CUR = 9 IMO this is clearly a bug in the special case logic for "0 but true". It should NOT set the IOK flag, it should set only the pIOK flag. I will naturally try to fix this in Sereal, but I consider this a bug in Perl and I am sending this to perlbug because of it. cheers, Yves perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 12:19:00 +0000
To: demerphq via RT <bug-Sereal-Encoder [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
demerphq via RT wrote: Show quoted text
>IMO this is clearly a bug in the special case logic for "0 but true".
The bug is certainly not in the "0 but true" special case. The bug is not specific to "0 but true", but is also shared by "00", "1 ", and the like. $ perl -MDevel::Peek -lwe 'my $a="1 "; my $b = $a+0; Dump $a' SV = PVIV(0xb0d1b0) at 0xb09b48 REFCNT = 1 FLAGS = (PADMY,IOK,POK,pIOK,pPOK) IV = 1 PV = 0xb02020 "1 "\0 CUR = 2 LEN = 16 Show quoted text
>IOK means that the integer representation is either a) canonical, or >b) a faithful representation of the PV.
There is certainly a case to be made that the flag *should* mean that, but de facto it doesn't mean that and never has. It's never even been close to that. De facto, in current Perl the IOK flag means that the scalar has an IV immediately available and *is an acceptable (non-warning) operand for numeric operations*. It currently looks rather pointless to preserve this aspect of a scalar, because non-numeric operands actually only warn on their first numeric operation, the one that generates the IV and sets the pIOK flag. (On subsequent operations the pIOK flag effectively muffles the warning despite the lack of IOK.) But that's where we are: IOK tells you very little about the relationship between the IV and PV slots (even excluding dualvars). Show quoted text
>(If IOK and pIOK do not mean these things then it is a total waste to >have both set of flags, which seems an unreasonable interpretation.)
What IOK actually represents does seem wasteful, considering how little use is made of it. Not a *total* waste, but a poor use of a precious flag bit. But being wasteful doesn't mean it isn't what actually happens. -zefram
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 13:30:49 +0100
To: bug-Sereal-Encoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 2 February 2015 at 13:19, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote: Show quoted text
> Queue: Sereal-Encoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101876 > > > demerphq via RT wrote:
>>IMO this is clearly a bug in the special case logic for "0 but true".
> > The bug is certainly not in the "0 but true" special case. The bug is not > specific to "0 but true", but is also shared by "00", "1 ", and the like.
As far as I can tell it is a bug in the "0 but true" branch as it does not set the IS_NUMBER_TRAILING flag. And it seems to also be a bug in sv_2iuv_common() which seems to not check the IS_NUMBER_TRAILING flag that is set on these examples. Actually I cant find *any* code that checks the IS_NUMBER_TRAILING flag. Show quoted text
> > $ perl -MDevel::Peek -lwe 'my $a="1 "; my $b = $a+0; Dump $a' > SV = PVIV(0xb0d1b0) at 0xb09b48 > REFCNT = 1 > FLAGS = (PADMY,IOK,POK,pIOK,pPOK) > IV = 1 > PV = 0xb02020 "1 "\0 > CUR = 2 > LEN = 16 >
>>IOK means that the integer representation is either a) canonical, or >>b) a faithful representation of the PV.
> > There is certainly a case to be made that the flag *should* mean that, > but de facto it doesn't mean that and never has. It's never even been > close to that. De facto, in current Perl the IOK flag means that the > scalar has an IV immediately available and *is an acceptable (non-warning) > operand for numeric operations*. It currently looks rather pointless to > preserve this aspect of a scalar, because non-numeric operands actually > only warn on their first numeric operation, the one that generates the > IV and sets the pIOK flag. (On subsequent operations the pIOK flag > effectively muffles the warning despite the lack of IOK.) But that's > where we are: IOK tells you very little about the relationship between > the IV and PV slots (even excluding dualvars). >
>>(If IOK and pIOK do not mean these things then it is a total waste to >>have both set of flags, which seems an unreasonable interpretation.)
> > What IOK actually represents does seem wasteful, considering how little > use is made of it. Not a *total* waste, but a poor use of a precious > flag bit. But being wasteful doesn't mean it isn't what actually happens.
I am not sure that "what happens" is relevant here, I think that what is relevant here is what *should* happen. It seems to me we have bug on top of bug here that builds up the status quo, and IMO it is better to fix the status quo than it is to live with it. This whole business is *insanely* complex and as you say the flags are being poorly used. Lets fix this once and for all. Yves
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 13:01:40 +0000
To: demerphq via RT <bug-Sereal-Encoder [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
demerphq via RT wrote: Show quoted text
>As far as I can tell it is a bug in the "0 but true" branch as it does >not set the IS_NUMBER_TRAILING flag.
No, that's not a bug. The special handling is precisely that the trailing "but true" is not regarded as "trailing trash" (which would render the string non-numeric). Note that trailing spaces are likewise not regarded as trash, nor are leading spaces or leading zeroes. If you want grok_number_flags() to detect non-canonical integer representations, it would need a new output flag to represent that state and several new bits of logic to set it. Show quoted text
>And it seems to also be a bug in sv_2iuv_common() which seems to not >check the IS_NUMBER_TRAILING flag that is set on these examples.
sv_2iuv_common() doesn't pass in the PERL_SCAN_TRAILING flag that would request the IS_NUMBER_TRAILING flag to be used. Without it, the "trailing trash" case returns numtype==0 rather than a numtype with the IS_NUMBER_TRAILING flag set. It is the lack of the IS_NUMBER_IN_UV flag, rather than the presence of IS_NUMBER_TRAILING, that controls sv_2iuv_common(). Show quoted text
>Actually I cant find *any* code that checks the IS_NUMBER_TRAILING flag.
I concur. There's only one use of PERL_SCAN_TRAILING, connected with magic incrementation, and it's being used to avoid getting numtype==0 rather than to get the IS_NUMBER_TRAILING flag. Show quoted text
>I am not sure that "what happens" is relevant here, I think that what >is relevant here is what *should* happen.
What happens is very relevant when processing strings on current Perl versions and attempting to serialise data structures containing them. Changing the behaviour of the IOK flag in future Perl versions wouldn't invalidate the need for serialisation to preserve string values on recent versions. -zefram
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 14:31:24 +0100
To: bug-Sereal-Encoder [...] rt.cpan.org
From: demerphq <demerphq [...] gmail.com>
On 2 February 2015 at 14:01, Zefram via RT <bug-Sereal-Encoder@rt.cpan.org> wrote: Show quoted text
> Queue: Sereal-Encoder > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=101876 > > > demerphq via RT wrote:
>>As far as I can tell it is a bug in the "0 but true" branch as it does >>not set the IS_NUMBER_TRAILING flag.
> > No, that's not a bug. The special handling is precisely that the > trailing "but true" is not regarded as "trailing trash" (which would > render the string non-numeric). Note that trailing spaces are likewise > not regarded as trash, nor are leading spaces or leading zeroes. If you > want grok_number_flags() to detect non-canonical integer representations, > it would need a new output flag to represent that state and several new > bits of logic to set it.
It looks like a bug/oversight to me. Show quoted text
>
>>And it seems to also be a bug in sv_2iuv_common() which seems to not >>check the IS_NUMBER_TRAILING flag that is set on these examples.
> > sv_2iuv_common() doesn't pass in the PERL_SCAN_TRAILING flag that > would request the IS_NUMBER_TRAILING flag to be used.
Yes, that is also true, probably because it calls the grok_number() wrapper which has no way to pass in this flag. Show quoted text
> Without it, the > "trailing trash" case returns numtype==0 rather than a numtype with the > IS_NUMBER_TRAILING flag set. It is the lack of the IS_NUMBER_IN_UV > flag, rather than the presence of IS_NUMBER_TRAILING, that controls > sv_2iuv_common().
Yes I know. And my position is that that is the bug. Show quoted text
>>Actually I cant find *any* code that checks the IS_NUMBER_TRAILING flag.
> > I concur. There's only one use of PERL_SCAN_TRAILING, connected with > magic incrementation, and it's being used to avoid getting numtype==0 > rather than to get the IS_NUMBER_TRAILING flag.
Hrm. Show quoted text
>>I am not sure that "what happens" is relevant here, I think that what >>is relevant here is what *should* happen.
> > What happens is very relevant when processing strings on current Perl > versions and attempting to serialise data structures containing them. > Changing the behaviour of the IOK flag in future Perl versions wouldn't > invalidate the need for serialisation to preserve string values on > recent versions.
I think the right plan is to fix Perl, and then fix Sereal to work around previous versions. The current state of this logic is completely unacceptable and not fit for purpose. *an entire bit every sv to prevent "not a number" warnings*? Wtf? Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"
Subject: Re: [rt.cpan.org #101876] losing string value of semi-numeric string
Date: Mon, 2 Feb 2015 17:42:09 +0000
To: demerphq via RT <bug-Sereal-Encoder [...] rt.cpan.org>
From: Zefram <zefram [...] fysh.org>
Attached are two patches. The first adds failing test cases for various kinds of semi-numeric string. The second implements the simplest possible fix: to prefer PV serialisation where multiple OK flags are set. This totally fixes strings, but at the expense of both some space efficiency and, more importantly, behaviour on some numeric values that have been coerced to string. (Those get POK set in a way that's just as misleading as IOK being set on "00".) For example, Data::Float::nextup(0.5) stringifies as "0.5", just as 0.5 does, so using string encoding for it is lossy. Fully general fix really needs to try the implicit coercions, to see which slot fully describes the scalar. -zefram

Message body is not shown because sender requested not to inline it.

Message body is not shown because sender requested not to inline it.