Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Net-Twitter CPAN distribution.

Report information
The Basics
Id: 55939
Status: rejected
Priority: 0/
Queue: Net-Twitter

People
Owner: MMIMS [...] cpan.org
Requestors: joe.paxton [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 3.12000
Fixed in: (no value)



Subject: Malformed UTF-8 character (fatal)
I'm receiving the following error when using Net::Twitter 3.12: Malformed UTF-8 character (fatal) at ./myScript.pl line 340. Here is line 340: $myString =~ s/\R/ /g; $myString contains a string from a public timeline Twitter update downloaded using $nt->public_timeline / OAuth. I'm continuously downloading tweets from the public timeline (using $nt->until_rate() to adjust for rate limits), and this only happens about every two or three days -- i.e., only every 500,000 tweets or so. I've been able to perform other search and replace operations on the string (e.g., $myString =~ s/;/ /g;) without problems. So there seems to be an interaction between the match done on the \R meta-character, and the string I'm getting back from Twitter via Net::Twitter. I'm not sure how to capture the string that causing the problem, given the fatal error. But if it would help to have the actual string, and if someone could tell me how to do that, I'd be happy to do so and report back.
Subject: Re: [rt.cpan.org #55939] Malformed UTF-8 character (fatal)
Date: Thu, 25 Mar 2010 08:09:02 -0700
To: Joe Paxton via RT <bug-Net-Twitter [...] rt.cpan.org>
From: Marc Mims <marc [...] questright.com>
* Joe Paxton via RT <bug-Net-Twitter@rt.cpan.org> [100325 07:53]: Show quoted text
> > I'm receiving the following error when using Net::Twitter 3.12: > > Malformed UTF-8 character (fatal) at ./myScript.pl line 340. > > Here is line 340: > > $myString =~ s/\R/ /g; > > $myString contains a string from a public timeline Twitter update > downloaded using $nt->public_timeline / OAuth. > > I'm continuously downloading tweets from the public timeline (using > $nt->until_rate() to adjust for rate limits), and this only happens > about every two or three days -- i.e., only every 500,000 tweets or so. > > I've been able to perform other search and replace operations on the > string (e.g., $myString =~ s/;/ /g;) without problems. So there seems to > be an interaction between the match done on the \R meta-character, and > the string I'm getting back from Twitter via Net::Twitter. > > I'm not sure how to capture the string that causing the problem, given > the fatal error. But if it would help to have the actual string, and if > someone could tell me how to do that, I'd be happy to do so and report back.
You should be able to catch the error in an eval block. (I prefer Try::Tiny for that): use Try::Tiny; my $status = ...; try { $myString =~ s/\R/ /g; } catch { warn "Error on status $status->{id}: $_\n"; }; Something like that should report the status in which the error occurs and we can investigate. This is probably not a Net::Twitter bug (unless Net::Twitter is returning something different than Twitter is sending it), but I'll keep it open until we know. -Marc
From: joe.paxton [...] gmail.com
On Thu Mar 25 11:09:29 2010, marc@questright.com wrote: Show quoted text
> You should be able to catch the error in an eval block. (I prefer > Try::Tiny for that): > > use Try::Tiny; > > my $status = ...; > try { > $myString =~ s/\R/ /g; > } > catch { > warn "Error on status $status->{id}: $_\n"; > }; > > Something like that should report the status in which the error occurs > and we can investigate. > > This is probably not a Net::Twitter bug (unless Net::Twitter is > returning something different than Twitter is sending it), but I'll > keep > it open until we know. > > -Marc
Hi Marc, I was able to catch the error in the way you suggested (thanks for tip). Here is the error (actually two errors, but possibly with the same root cause): Error on status 11396609263: Malformed UTF-8 character (fatal) at ./myScript.pl line 344. "\x{ffff}" does not map to utf8 at ./myScript.pl line 370. Line 344 is as follows (mentioned previously): $myString =~ s/\R/ /g; And line 370 is just a print command for the same string: print FH "$myString"; Just in case it's relevant, FH is opened for writing at the beginning of the script in the following way: open(DATA, '>>:encoding(UTF-8)', $dataFile) or die $!; Here's a link to the relevant Tweet via the Twitter website: http://twitter.com/SunMiHoRa/statuses/11396609263 Save for the fact that I can't read Chinese, I don't see anything strange there. But perhaps that's just because the "\x{ffff}" is non-printable? In any case, it turns out that the script handles the error in more the way I want it to when I use the try...catch block (i.e., it doesn't crap out). Thus, I think I'm actually in good shape, now. But I would be interested to know if you determine the source of the issue I am experiencing--i.e., whether the source is Twitter itself allowing in strange characters, Net::Twitter adding or modifying characters, or some combination of the two. I would suspect the former, were it not for the fact that the post shows up without issue on the web. But I also have trouble imagining Net::Twitter would be introducing any of these strange characters. So I'm a bit confused, but ultimately not too worried. -Joe
I'm marking the bug closed since this isn't a Net::Twitter bug. Net::Twitter is faithfully returning the status exactly as it was received from Twitter. \x{ffff} is an invalid character, but that is, apparently what the author of the status text posted. Using an eval block in calling code is the proper way to handle it. You may want to ignore statuses with invalid characters, remove the invalid characters, or handle the error in some other way. Thanks for posting!