CC: | perl5-porters [...] perl.org, bug-Encode [...] rt.cpan.org |
Subject: | [perl #112608] panic: sv_setpvn called with negative strlen -1 |
Date: | Thu, 26 Apr 2012 09:46:45 -0700 |
To: | "OtherRecipients of perl Ticket #112608":; |
From: | "Father Chrysostomos via RT" <perlbug-followup [...] perl.org> |
On Thu Apr 26 08:35:07 2012, sprout wrote:
Show quoted text
> On Thu Apr 26 01:20:30 2012, Steve.Hay@verosoftware.com wrote:
>
> I think your browser is trying to convert \n to \r\n when you download
> it. I downloaded the file and it has \r\n line breaks in it.
>
deeper.
Show quoted text> > Father Chrysostomos via RT wrote on 2012-04-26:
>
> I should have read Leon Timmerman’s message more closely. If I use
> :crlf:encoding(...) I get the panic, but not with :crlf after.
>
> > > I get the same result on a Mac, both
> > threaded and unthreaded.
> > >
> > > A C backtrace would help.
> > >
> > Those messages are, of course, expected given what the program is
> > being asked to do. I get them first too, but then followed by a
> > crash (panic).
> >
> > Is it an EOL issue? (I'm running on Windows
> > here.) Leon mentioned it only happening when :crlf was combined
> > with :encoding. I still see the crash with
> >
> > open my $rh,
> > '<:encoding(UTF-8):crlf', 'utf8.txt' or die $!;
> > open my $wh,
> > '>:encoding(ISO-8859-1):crlf', 'iso88591.txt' or die $!;
> >
> > but not
> > with
> >
> > open my $rh, '<:raw:encoding(UTF-8)', 'utf8.txt' or die $!;
> > open my $wh, '>:raw:encoding(ISO-8859-1)', 'iso88591.txt' or die
> > $!;
> >
> > Also, when I download the utf8.txt attachment I find that it
> > has got mangled somehow from what I uploaded: the line endings are
> > now \r\r\n rather than the \r\n original which I uploaded.
> > Furthermore, the program doesn't crash for me if I leave it like
> > that -- I have to convert it back to \r\n before I see the crash
> > again.
> >
> > I will try to get a backtrace.
>
> I see your backtrace has Encode’s XS code in it. I will try to dig
>
This is probably a bug in Encode. I’ve reduced it to a standalone
script (with no input file), attached as foo.text. (I know, it’s not
much smaller, but it’s easier to handle.)
This code in cpan/Encode/Encode.xs:encode_method:
ENCODE_SET_SRC:
if (check && !(check & ENCODE_LEAVE_SRC)){
sdone = SvCUR(src) - (slen+sdone);
if (sdone) {
sv_setpvn(src, (char*)s+slen, sdone);
}
calls sv_setpvn with 4294967295 for the length argument.
Before this chunk of code is reached, sdone has a value of 1025, but src
is only 1024 octets long. slen is 0. So the sdone assignment sets it
to -1, which becomes 4294967295 because sdone is unsigned.
So somehow Encode is losing track of how far it is through the string.
--
Father Chrysostomos
---
via perlbug: queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=112608
@_ = (
"\x{feff}\x{39f}\x{3af} \x{3a3}\x{3c5}\x{3bd}\x{3ad}\x{3bd}\x{3bf}\x{3c7}\x{3bf}\x{3b9}\n",
"\x{39f}\x{3b9} \x{393}\x{3b5}\x{3bd}\x{3bd}\x{3b1}\x{3af}\x{3bf}\x{3b9} \x{3c4}\x{3b7}\x{3c2} \x{3a3}\x{3b1}\x{3bc}\x{3bf}\x{3b8}\x{3c1}\x{3ac}\x{3ba}\x{3b7}\x{3c2}\n",
"\x{39f}\x{3b9} \x{393}\x{3b5}\x{3c1}\x{3bc}\x{3b1}\x{3bd}\x{3bf}\x{3af} \x{3be}\x{3b1}\x{3bd}\x{3ac}\x{3c1}\x{3c7}\x{3bf}\x{3bd}\x{3c4}\x{3b1}\x{3b9}...\n",
"\x{39f}\x{3b9} \x{395}\x{3c1}\x{3b1}\x{3c3}\x{3c4}\x{3ad}\x{3c2} \x{3a4}\x{3bf}\x{3c5} \x{391}\x{3b9}\x{3b3}\x{3b1}\x{3af}\x{3bf}\x{3c5}\n",
"\x{39f}\x{3b9} \x{39a}\x{3c5}\x{3bd}\x{3b7}\x{3b3}\x{3bf}\x{3af}\n",
"\x{39f}\x{3b9} \x{3a0}\x{3b1}\x{3bd}\x{3ba}\x{3c2} \x{3a4}\x{3b1} \x{39a}\x{3ac}\x{3bd}\x{3bf}\x{3c5}\x{3bd} \x{38c}\x{3bb}\x{3b1}\n",
"\x{39f}\x{3b9} \x{3a6}\x{3b1}\x{3bd}\x{3c4}\x{3b1}\x{3c1}\x{3af}\x{3bd}\x{3b5}\x{3c2}\n",
"\x{39f}\x{3b9}\x{3ba}\x{3bf}\x{3b3}\x{3ad}\x{3bd}\x{3b5}\x{3b9}\x{3b1} \x{3a0}\x{3b1}\x{3bd}\x{3c4}\x{3c1}\x{3b5}\x{3c5}\x{3cc}\x{3bc}\x{3b1}\x{3c3}\x{3c4}\x{3b5}\n",
"\x{39f}\x{3bb}\x{3b1} \x{3b5}\x{3af}\x{3bd}\x{3b1}\x{3b9} \x{3b4}\x{3c1}\x{3cc}\x{3bc}\x{3bf}\x{3c2}\n",
"\x{39f}\x{3bc}\x{3b7}\x{3c1}\x{3bf}\x{3c2}\n",
"\x{39f}\x{3be}\x{3c5}\x{3b3}\x{3cc}\x{3bd}\x{3bf}\n",
"\x{39f}\x{3c1}\x{3b1}\x{3c4}\x{3cc}\x{3c4}\x{3b7}\x{3c2} \x{3bc}\x{3b7}\x{3b4}\x{3ad}\x{3bd}\n",
"\x{3c0}\n",
"\x{3c0}\x{3ac}\x{3bd}\x{3c9}, \x{3ba}\x{3ac}\x{3c4}\x{3c9} \x{3ba}\x{3b1}\x{3b9} \x{3c0}\x{3bb}\x{3b1}\x{3b3}\x{3af}\x{3c9}\x{3c2}\n",
"\x{3a4}\x{3bf} \x{39a}\x{3b1}\x{3ba}\x{3cc}\n",
"\x{3a4}\x{3bf} \x{39a}\x{3b1}\x{3ba}\x{3cc} - \x{3a3}\x{3c4}\x{3b7}\x{3bd} \x{395}\x{3c0}\x{3bf}\x{3c7}\x{3ae} \x{3c4}\x{3c9}\x{3bd} \x{397}\x{3c1}\x{3ce}\x{3c9}\x{3bd}\n",
"\x{3a4}\x{3bf} \x{3ba}\x{3bb}\x{3ac}\x{3bc}\x{3b1} \x{3b2}\x{3b3}\x{3ae}\x{3ba}\x{3b5} \x{3b1}\x{3c0}'\x{3c4}\x{3bf}\x{3bd} \x{3c0}\x{3b1}\x{3c1}\x{3ac}\x{3b4}\x{3b5}\x{3b9}\x{3c3}\x{3bf}\n",
"\x{3a4}\x{3bf} \x{3ba}\x{3bf}\x{3c1}\x{3af}\x{3c4}\x{3c3}\x{3b9} \x{3bc}\x{3b5} \x{3c4}\x{3b1} \x{3bc}\x{3b1}\x{3cd}\x{3c1}\x{3b1}\n",
"\x{3a4}\x{3bf} \x{3ba}\x{3bf}\x{3c1}\x{3af}\x{3c4}\x{3c3}\x{3b9} \x{3c4}\x{3bf}\x{3c5} \x{3bb}\x{3bf}\x{3cd}\x{3bd}\x{3b1} \x{3c0}\x{3b1}\x{3c1}\x{3ba}\n",
"\x{3a4}\x{3bf} \x{39e}\x{3cd}\x{3bb}\x{3bf} \x{3b2}\x{3b3}\x{3ae}\x{3ba}\x{3b5} \x{3b1}\x{3c0}\x{3cc} \x{3c4}\x{3bf}\x{3bd} \x{3c0}\x{3b1}\x{3c1}\x{3ac}\x{3b4}\x{3b5}\x{3b9}\x{3c3}\x{3bf}\n",
"\x{3a4}\x{3bf} \x{3c0}\x{3b9}\x{3bf} \x{3bb}\x{3b1}\x{3bc}\x{3c0}\x{3c1}\x{3cc} \x{3b1}\x{3c3}\x{3c4}\x{3ad}\x{3c1}\x{3b9}\n",
"\x{3a4}\x{3bf} \x{3a1}\x{3b5}\x{3bc}\x{3b1}\x{3bb}\x{3b9} \x{3a4}\x{3b7}\x{3c2} \x{391}\x{3b8}\x{3b7}\x{3bd}\x{3b1}\x{3c2}\n",
"\x{3a4}\x{3bf} \x{3a4}\x{3b1}\x{3bd}\x{3b3}\x{3ba}\x{3cc} \x{3c4}\x{3c9}\x{3bd} \x{3a7}\x{3c1}\x{3b9}\x{3c3}\x{3c4}\x{3bf}\x{3c5}\x{3b3}\x{3ad}\x{3bd}\x{3bd}\x{3c9}\x{3bd}\n",
"\x{3a4}\x{3bf} \x{3c4}\x{3b5}\x{3bb}\x{3b5}\x{3c5}\x{3c4}\x{3b1}\x{3af}\x{3bf} \x{3c8}\x{3ad}\x{3bc}\x{3bc}\x{3b1}\n",
"\x{3a4}\x{3bf} \x{3c6}\x{3b9}\x{3bb}\x{3af} \x{3c4}\x{3b7}\x{3c2}... \x{396}\x{3c9}\x{3ae}\x{3c2}\n",
"\x{3a4}\x{3bf} \x{3c7}\x{3ce}\x{3bc}\x{3b1} \x{3b2}\x{3ac}\x{3c6}\x{3c4}\x{3b7}\x{3ba}\x{3b5} \x{3ba}\x{3cc}\x{3ba}\x{3ba}\x{3b9}\x{3bd}\x{3bf}\n",
"\x{3a4}\x{3bf}\x{3c0}\x{3af}\x{3bf} \x{3c3}\x{3c4}\x{3b7}\x{3bd} \x{3bf}\x{3bc}\x{3af}\x{3c7}\x{3bb}\x{3b7}\n",
"\x{3a4}\x{3c1}\x{3b9}\x{3bb}\x{3bf}\x{3b3}\x{3af}\x{3b1} 1: \x{3a4}\x{3bf} \x{39b}\x{3b9}\x{3b2}\x{3ac}\x{3b4}\x{3b9} \x{3c0}\x{3bf}\x{3c5} \x{3b4}\x{3b1}\x{3ba}\x{3c1}\x{3cd}\x{3b6}\x{3b5}\x{3b9}\n"
);
open my $wh, '>:crlf:encoding(ISO-8859-1)', \$out or die $!;
print $wh $_ for @_;
close $wh;