Skip Menu |

This queue is for tickets about the WWW-Myspace CPAN distribution.

Report information
The Basics
Id: 30351
Status: resolved
Priority: 0/
Queue: WWW-Myspace

People
Owner: Nobody in particular
Requestors: steven [...] pyro.eu.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Invalid UTF-8 characters may break some functions, and possibly unrelated 'hang' in get_inbox
Date: Tue, 30 Oct 2007 00:26:20 +0000
To: bug-www-myspace [...] rt.cpan.org
From: Steven Chamberlain <steven [...] pyro.eu.org>
Hi, I've had some if these errors coming up intermittently over the last few weeks, but now they seem to appear almost all of the time: Malformed UTF-8 character (unexpected continuation byte 0xa5, with no preceding start byte) in substitution (s///) at /usr/local/share/perl/5.8.8/WWW/Myspace.pm line 3568. I believe this happens when users have used HTML entities such as &hearts; in their display names. On some of Myspace's pages, these entities are served to the browser as actual (ASCII?) characters which I believe are invalid UTF-8, when they should be encoded as an HTML entity. Until recently, I only noticed this problem when one of the messages in the inbox had &hearts; in the message subject. This now happens in the recently-added 'Who's Online?' panel appearing to the right of the Myspace Mail (messages) inbox. The result is that if anyone with certain certain HTML entities in their display name happens to be online at the time, then an attempt to read inbox messages with WWW::Myspace, fails with the above error message. Running in 'C' locale is a possible workaround: LANG=C ./myscript.pl I'm not sure if that workaround works or not, because I am now being affected by a possibly unrelated problem with get_inbox. During a call to get_inbox, after WWW::Myspace has downloaded the inbox page from Myspace, it hangs with 100% CPU usage. I have tested this both with an empty inbox, and with an inbox containing a (read) message. I'm afraid I know very little about Perl, but I figured out how to trace line-by-line using 'perl -d:Trace' and found that it stops at Myspace.pm:3868: Show quoted text
>> /usr/local/share/perl/5.8.8/WWW/Myspace.pm:3567:
last if ( $options{'stop_at'} && ( $options{'stop_at'} == $3 ) ); Show quoted text
>> /usr/local/share/perl/5.8.8/WWW/Myspace.pm:3568:
push @messages, I hope that the information I've provided so far is helpful, please let me know if there is anything else I can do to debug this. Regards, -- Steven Chamberlain steven@pyro.eu.org
Subject: Re: [rt.cpan.org #30351] Invalid UTF-8 characters may break some functions, and possibly unrelated 'hang' in get_inbox
Date: Mon, 29 Oct 2007 19:01:08 -0700
To: bug-WWW-Myspace [...] rt.cpan.org
From: Grant Grueninger <grantg [...] spamarrest.com>
Hi Steven, I'm having trouble reproducing this - on which OS are you running?
Subject: Re: [rt.cpan.org #30351] Invalid UTF-8 characters may break some functions, and possibly unrelated 'hang' in get_inbox
Date: Tue, 30 Oct 2007 04:16:56 +0000
To: bug-WWW-Myspace [...] rt.cpan.org
From: Steven Chamberlain <steven [...] pyro.eu.org>
Sorry, I forgot to mention, this is using WWW::Myspace version 0.72. This is on Debian 'etch', i686. I also made a typo in my previous email, the line causing the hang was 3568 (in one place I wrote this wrongly as 3868). This line, I just noticed, is the same line that was generating UTF-8 errors also. It seems the two problems could be related after all. My best guess is that the s// substitution hangs because of something in $page. It could be invalid UTF-8 characters on the page, or the regular expression might match text on the page which it shouldn't be matching. Unfortunately I'm having some trouble reproducing the problem now. I think it depends which of my 'friends' are currently online (some of them have HTML entities like &hearts; in the display name and I believe those are the cause of this). I've tried messaging my own account with a message subject containing &hearts; but that no longer seems to cause a problem for get_inbox. I'm still trying to debug this myself but so far I'm struggling to reproduce the problem consistently. I'll be able to continue working on this tomorrow -- please let me know if there is anything specific I could try. Thank you, -- Steven Chamberlain steven@pyro.eu.org
Hi Steven, Try upgrading to the latest version of WWW::Mechanize and HTML::Parser. I want to see if that fixes the UTF-8 character handling. If so I'll update the requirements for WWW::Myspace. If that doesn't work, there's a workaround patch here: http://rt.cpan.org/Ticket/Display.html?id=30762 I'm debating wether or not to put it into the next version of WWW::Myspace (I'd rather fix the bug at its source instead of implementing a clunky workaround). Grant On Tue Oct 30 00:17:38 2007, steven@pyro.eu.org wrote: Show quoted text
> Sorry, I forgot to mention, this is using WWW::Myspace version 0.72. > This is on Debian 'etch', i686. > > I also made a typo in my previous email, the line causing the hang was > 3568 (in one place I wrote this wrongly as 3868). This line, I just > noticed, is the same line that was generating UTF-8 errors also. It > seems the two problems could be related after all. > > My best guess is that the s// substitution hangs because of something in > $page. It could be invalid UTF-8 characters on the page, or the regular > expression might match text on the page which it shouldn't be matching. > > Unfortunately I'm having some trouble reproducing the problem now. I > think it depends which of my 'friends' are currently online (some of > them have HTML entities like &hearts; in the display name and I believe > those are the cause of this). > > I've tried messaging my own account with a message subject containing > &hearts; but that no longer seems to cause a problem for get_inbox. > > I'm still trying to debug this myself but so far I'm struggling to > reproduce the problem consistently. I'll be able to continue working on > this tomorrow -- please let me know if there is anything specific I > could try. > > Thank you,
Closing this ticket - please refer to ticket 30762 for continued discussion and patches. http://rt.cpan.org/Ticket/Display.html?id=30762