Bug #76824 for Encode: [perl #112608] panic: sv_setpvn called with negative strlen -1

Thu Apr 26 12:46:55 2012 perlbug-followup [...] perl.org - Ticket created

CC:	perl5-porters [...] perl.org, bug-Encode [...] rt.cpan.org
Subject:	[perl #112608] panic: sv_setpvn called with negative strlen -1
Date:	Thu, 26 Apr 2012 09:46:45 -0700
To:	"OtherRecipients of perl Ticket #112608":;
From:	"Father Chrysostomos via RT" <perlbug-followup [...] perl.org>

On Thu Apr 26 08:35:07 2012, sprout wrote: Show quoted text

> On Thu Apr 26 01:20:30 2012, Steve.Hay@verosoftware.com wrote:

> > Father Chrysostomos via RT wrote on 2012-04-26:

> > > I get the same result on a Mac, both

> > threaded and unthreaded.

> > I should have read Leon Timmerman’s message more closely. If I use > :crlf:encoding(...) I get the panic, but not with :crlf after. >

> > > > > > A C backtrace would help. > > >

> > Those messages are, of course, expected given what the program is > > being asked to do. I get them first too, but then followed by a > > crash (panic). > > > > Is it an EOL issue? (I'm running on Windows > > here.) Leon mentioned it only happening when :crlf was combined > > with :encoding. I still see the crash with > > > > open my $rh, > > '<:encoding(UTF-8):crlf', 'utf8.txt' or die $!; > > open my $wh, > > '>:encoding(ISO-8859-1):crlf', 'iso88591.txt' or die $!; > > > > but not > > with > > > > open my $rh, '<:raw:encoding(UTF-8)', 'utf8.txt' or die $!; > > open my $wh, '>:raw:encoding(ISO-8859-1)', 'iso88591.txt' or die > > $!; > > > > Also, when I download the utf8.txt attachment I find that it > > has got mangled somehow from what I uploaded: the line endings are > > now \r\r\n rather than the \r\n original which I uploaded. > > Furthermore, the program doesn't crash for me if I leave it like > > that -- I have to convert it back to \r\n before I see the crash > > again.

> > I think your browser is trying to convert \n to \r\n when you download > it. I downloaded the file and it has \r\n line breaks in it. >

> > > > I will try to get a backtrace.

> > I see your backtrace has Encode’s XS code in it. I will try to dig

deeper. Show quoted text

>

This is probably a bug in Encode. I’ve reduced it to a standalone script (with no input file), attached as foo.text. (I know, it’s not much smaller, but it’s easier to handle.) This code in cpan/Encode/Encode.xs:encode_method: ENCODE_SET_SRC: if (check && !(check & ENCODE_LEAVE_SRC)){ sdone = SvCUR(src) - (slen+sdone); if (sdone) { sv_setpvn(src, (char*)s+slen, sdone); } calls sv_setpvn with 4294967295 for the length argument. Before this chunk of code is reached, sdone has a value of 1025, but src is only 1024 octets long. slen is 0. So the sdone assignment sets it to -1, which becomes 4294967295 because sdone is unsigned. So somehow Encode is losing track of how far it is through the string. -- Father Chrysostomos --- via perlbug: queue: perl5 status: open https://rt.perl.org:443/rt3/Ticket/Display.html?id=112608

@_ = ( "\x{feff}\x{39f}\x{3af} \x{3a3}\x{3c5}\x{3bd}\x{3ad}\x{3bd}\x{3bf}\x{3c7}\x{3bf}\x{3b9}\n", "\x{39f}\x{3b9} \x{393}\x{3b5}\x{3bd}\x{3bd}\x{3b1}\x{3af}\x{3bf}\x{3b9} \x{3c4}\x{3b7}\x{3c2} \x{3a3}\x{3b1}\x{3bc}\x{3bf}\x{3b8}\x{3c1}\x{3ac}\x{3ba}\x{3b7}\x{3c2}\n", "\x{39f}\x{3b9} \x{393}\x{3b5}\x{3c1}\x{3bc}\x{3b1}\x{3bd}\x{3bf}\x{3af} \x{3be}\x{3b1}\x{3bd}\x{3ac}\x{3c1}\x{3c7}\x{3bf}\x{3bd}\x{3c4}\x{3b1}\x{3b9}...\n", "\x{39f}\x{3b9} \x{395}\x{3c1}\x{3b1}\x{3c3}\x{3c4}\x{3ad}\x{3c2} \x{3a4}\x{3bf}\x{3c5} \x{391}\x{3b9}\x{3b3}\x{3b1}\x{3af}\x{3bf}\x{3c5}\n", "\x{39f}\x{3b9} \x{39a}\x{3c5}\x{3bd}\x{3b7}\x{3b3}\x{3bf}\x{3af}\n", "\x{39f}\x{3b9} \x{3a0}\x{3b1}\x{3bd}\x{3ba}\x{3c2} \x{3a4}\x{3b1} \x{39a}\x{3ac}\x{3bd}\x{3bf}\x{3c5}\x{3bd} \x{38c}\x{3bb}\x{3b1}\n", "\x{39f}\x{3b9} \x{3a6}\x{3b1}\x{3bd}\x{3c4}\x{3b1}\x{3c1}\x{3af}\x{3bd}\x{3b5}\x{3c2}\n", "\x{39f}\x{3b9}\x{3ba}\x{3bf}\x{3b3}\x{3ad}\x{3bd}\x{3b5}\x{3b9}\x{3b1} \x{3a0}\x{3b1}\x{3bd}\x{3c4}\x{3c1}\x{3b5}\x{3c5}\x{3cc}\x{3bc}\x{3b1}\x{3c3}\x{3c4}\x{3b5}\n", "\x{39f}\x{3bb}\x{3b1} \x{3b5}\x{3af}\x{3bd}\x{3b1}\x{3b9} \x{3b4}\x{3c1}\x{3cc}\x{3bc}\x{3bf}\x{3c2}\n", "\x{39f}\x{3bc}\x{3b7}\x{3c1}\x{3bf}\x{3c2}\n", "\x{39f}\x{3be}\x{3c5}\x{3b3}\x{3cc}\x{3bd}\x{3bf}\n", "\x{39f}\x{3c1}\x{3b1}\x{3c4}\x{3cc}\x{3c4}\x{3b7}\x{3c2} \x{3bc}\x{3b7}\x{3b4}\x{3ad}\x{3bd}\n", "\x{3c0}\n", "\x{3c0}\x{3ac}\x{3bd}\x{3c9}, \x{3ba}\x{3ac}\x{3c4}\x{3c9} \x{3ba}\x{3b1}\x{3b9} \x{3c0}\x{3bb}\x{3b1}\x{3b3}\x{3af}\x{3c9}\x{3c2}\n", "\x{3a4}\x{3bf} \x{39a}\x{3b1}\x{3ba}\x{3cc}\n", "\x{3a4}\x{3bf} \x{39a}\x{3b1}\x{3ba}\x{3cc} - \x{3a3}\x{3c4}\x{3b7}\x{3bd} \x{395}\x{3c0}\x{3bf}\x{3c7}\x{3ae} \x{3c4}\x{3c9}\x{3bd} \x{397}\x{3c1}\x{3ce}\x{3c9}\x{3bd}\n", "\x{3a4}\x{3bf} \x{3ba}\x{3bb}\x{3ac}\x{3bc}\x{3b1} \x{3b2}\x{3b3}\x{3ae}\x{3ba}\x{3b5} \x{3b1}\x{3c0}'\x{3c4}\x{3bf}\x{3bd} \x{3c0}\x{3b1}\x{3c1}\x{3ac}\x{3b4}\x{3b5}\x{3b9}\x{3c3}\x{3bf}\n", "\x{3a4}\x{3bf} \x{3ba}\x{3bf}\x{3c1}\x{3af}\x{3c4}\x{3c3}\x{3b9} \x{3bc}\x{3b5} \x{3c4}\x{3b1} \x{3bc}\x{3b1}\x{3cd}\x{3c1}\x{3b1}\n", "\x{3a4}\x{3bf} \x{3ba}\x{3bf}\x{3c1}\x{3af}\x{3c4}\x{3c3}\x{3b9} \x{3c4}\x{3bf}\x{3c5} \x{3bb}\x{3bf}\x{3cd}\x{3bd}\x{3b1} \x{3c0}\x{3b1}\x{3c1}\x{3ba}\n", "\x{3a4}\x{3bf} \x{39e}\x{3cd}\x{3bb}\x{3bf} \x{3b2}\x{3b3}\x{3ae}\x{3ba}\x{3b5} \x{3b1}\x{3c0}\x{3cc} \x{3c4}\x{3bf}\x{3bd} \x{3c0}\x{3b1}\x{3c1}\x{3ac}\x{3b4}\x{3b5}\x{3b9}\x{3c3}\x{3bf}\n", "\x{3a4}\x{3bf} \x{3c0}\x{3b9}\x{3bf} \x{3bb}\x{3b1}\x{3bc}\x{3c0}\x{3c1}\x{3cc} \x{3b1}\x{3c3}\x{3c4}\x{3ad}\x{3c1}\x{3b9}\n", "\x{3a4}\x{3bf} \x{3a1}\x{3b5}\x{3bc}\x{3b1}\x{3bb}\x{3b9} \x{3a4}\x{3b7}\x{3c2} \x{391}\x{3b8}\x{3b7}\x{3bd}\x{3b1}\x{3c2}\n", "\x{3a4}\x{3bf} \x{3a4}\x{3b1}\x{3bd}\x{3b3}\x{3ba}\x{3cc} \x{3c4}\x{3c9}\x{3bd} \x{3a7}\x{3c1}\x{3b9}\x{3c3}\x{3c4}\x{3bf}\x{3c5}\x{3b3}\x{3ad}\x{3bd}\x{3bd}\x{3c9}\x{3bd}\n", "\x{3a4}\x{3bf} \x{3c4}\x{3b5}\x{3bb}\x{3b5}\x{3c5}\x{3c4}\x{3b1}\x{3af}\x{3bf} \x{3c8}\x{3ad}\x{3bc}\x{3bc}\x{3b1}\n", "\x{3a4}\x{3bf} \x{3c6}\x{3b9}\x{3bb}\x{3af} \x{3c4}\x{3b7}\x{3c2}... \x{396}\x{3c9}\x{3ae}\x{3c2}\n", "\x{3a4}\x{3bf} \x{3c7}\x{3ce}\x{3bc}\x{3b1} \x{3b2}\x{3ac}\x{3c6}\x{3c4}\x{3b7}\x{3ba}\x{3b5} \x{3ba}\x{3cc}\x{3ba}\x{3ba}\x{3b9}\x{3bd}\x{3bf}\n", "\x{3a4}\x{3bf}\x{3c0}\x{3af}\x{3bf} \x{3c3}\x{3c4}\x{3b7}\x{3bd} \x{3bf}\x{3bc}\x{3af}\x{3c7}\x{3bb}\x{3b7}\n", "\x{3a4}\x{3c1}\x{3b9}\x{3bb}\x{3bf}\x{3b3}\x{3af}\x{3b1} 1: \x{3a4}\x{3bf} \x{39b}\x{3b9}\x{3b2}\x{3ac}\x{3b4}\x{3b9} \x{3c0}\x{3bf}\x{3c5} \x{3b4}\x{3b1}\x{3ba}\x{3c1}\x{3cd}\x{3b6}\x{3b5}\x{3b9}\n" ); open my $wh, '>:crlf:encoding(ISO-8859-1)', \$out or die $!; print $wh $_ for @_; close $wh;

Sun Feb 17 21:09:29 2013 DANKOGAI [...] cpan.org - Status changed from 'new' to 'open'

Sat Oct 08 08:21:15 2016 pali [...] cpan.org - Cc PALI added

Sat Oct 08 08:33:55 2016 pali [...] cpan.org - Correspondence added

RT-Send-CC:

khw [...] cpan.org

Problem is still there also with blead perl. No crash or sv_setpvn panic anymore but valgrind show this error message: ==17627== Conditional jump or move depends on uninitialised value(s) ==17627== at 0x51E0AA: Perl_utf8n_to_uvchr (utf8.c:858) ==17627== by 0x663CA14: encode_method (Encode.xs:193) ==17627== by 0x663CCF9: XS_Encode__XS_encode (Encode.xs:785) Problem is again in this code from Encode.xs: STRLEN clen; UV ch = utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), &clen, UTF8_ALLOW_ANY|UTF8_CHECK_ONLY); I suspect that (SvCUR(src)-slen) is really incorrect and something like (tlen-sdone-slen) should be passed. IIRC s is pointer to first C char which is not yet processed in dst, slen is number of characters processed by last do_encode() call (in case of problems it can be just one or zero) and SvCUR(src) is length of original input string. (tlen-sdone) should be number of remaining characters in src, not processed in dst. With change (SvCUR(src)-slen) to (tlen-sdone-slen) valgrind does not show error message anymore... CCing khw, can you recheck this?

Tue Oct 18 15:44:49 2016 khw [...] cpan.org - Correspondence added

On Sat Oct 08 06:33:55 2016, PALI wrote: Show quoted text

> Problem is still there also with blead perl. No crash or sv_setpvn > panic anymore but valgrind show this error message: > > ==17627== Conditional jump or move depends on uninitialised value(s) > ==17627== at 0x51E0AA: Perl_utf8n_to_uvchr (utf8.c:858) > ==17627== by 0x663CA14: encode_method (Encode.xs:193) > ==17627== by 0x663CCF9: XS_Encode__XS_encode (Encode.xs:785) > > Problem is again in this code from Encode.xs: > > STRLEN clen; > UV ch = > utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), > &clen, UTF8_ALLOW_ANY|UTF8_CHECK_ONLY); > > I suspect that (SvCUR(src)-slen) is really incorrect and something > like (tlen-sdone-slen) should be passed. > > IIRC s is pointer to first C char which is not yet processed in dst, > slen is number of characters processed by last do_encode() call (in > case of problems it can be just one or zero) and SvCUR(src) is length > of original input string. (tlen-sdone) should be number of remaining > characters in src, not processed in dst. > > With change (SvCUR(src)-slen) to (tlen-sdone-slen) valgrind does not > show error message anymore... > > CCing khw, can you recheck this?

The current code is wrong. The 2nd parameter to utf8n_to_uvuni() is the upper limit of how far it is permissible to look beyond the first byte of the string pointed to by the first parameter. The typical way that the core code uses to handle this type of thing is to save s as s0 upon entrance to the function, and then it's easy to get it right. In this case, one could set s0 after s is adjusted for *offset. s0 = s; before the loop. tlen has been calculated to be the number of bytes available in s0, it should be used instead of SvCUR. Because at the time of the function call, s is slen bytes behind the sequence you want to convert, adjustments have to be made. You could write. utf8n_to_uvuni(s+slen, tlen - (s + slen - s0), ... I think this is the most foolproof and maintainable method. Note that slen = tlen - sdone so pali's solution (tlen-sdone-slen) can be rewritten as tlen - sdone - (tlen -sdone) == 0 which is wrong. Another option would be to calculate and use sleft

Tue Oct 18 15:58:40 2016 pali [...] cpan.org - Correspondence added

On Uto Okt 18 15:44:49 2016, khw wrote: Show quoted text

> Note that > > slen = tlen - sdone

I do not think this is truth. slen is always modified by do_encode() before utf8n_to_uvuni() call. And do_encode() set it to number of processed bytes.in that one do_encode() call. Not to tlen - sdone. Show quoted text

> so pali's solution > > (tlen-sdone-slen) > > can be rewritten as > > tlen - sdone - (tlen -sdone) == 0

Tue Oct 18 16:31:08 2016 pali [...] cpan.org - Correspondence added

Now I run all unit tests in Encode plus crash test from first post and compared (tlen-sdone-slen) and (tlen - (s + slen - s0)) values. Values are before every utf8n_to_uvuni() call exactly same.

Fri Oct 28 01:23:01 2016 DANKOGAI [...] cpan.org - Status changed from 'open' to 'resolved'