Skip Menu |

This queue is for tickets about the WWW-Myspace CPAN distribution.

Report information
The Basics
Id: 30762
Status: resolved
Priority: 0/
Queue: WWW-Myspace

People
Owner: Nobody in particular
Requestors: eewill40z [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Get_inbox hang, ref bug 30351
Date: Thu, 15 Nov 2007 15:41:37 -0600
To: <bug-WWW-Myspace [...] rt.cpan.org>
From: William Zorn <eewill40z [...] gmail.com>
I'm not sure what the fix is, but this module hangs because the 'last if...' statement around line 3581 (in sub _get_messages_from_page) will never be evaluated as true. The hash key 'stop_at' does not appear to exist in the %options hash. Same goes with the 'end_page' key into the %options hash located in the sub get_inbox.
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Thu, 15 Nov 2007 14:28:10 -0800
To: bug-WWW-Myspace [...] rt.cpan.org
From: Grant Grueninger <grantg [...] spamarrest.com>
Please upgrade to the current version of WWW::Myspace. The loop will exit when all the messages have been parsed - the $page variable is modified through each loop. If the RE being matched against $page doesn't match (i.e. due to a myspace change), the loop will exit anyway. Thus, this most likely isn't the cause of the problem. There is a known issue, however, that could cause an endless loop in get_inbox if the paging isn't working properly (i.e. if it tries to go to the next page, but gets the same one instead). The method works for me however (on MacOS 10.5). On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: Show quoted text
> > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > Transaction: Ticket created by eewill40z@gmail.com > Queue: WWW-Myspace > Subject: Get_inbox hang, ref bug 30351 > Broken in: (no value) > Severity: (no value) > Owner: Nobody > Requestors: eewill40z@gmail.com > Status: new > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > I'm not sure what the fix is, but this module hangs because the > 'last if...' statement around line 3581 (in sub > _get_messages_from_page) will never be evaluated as true. The hash > key 'stop_at' does not appear to exist in the %options hash. Same > goes with the 'end_page' key into the %options hash located in the > sub get_inbox. > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 08:03:42 -0600
To: bug-WWW-Myspace [...] rt.cpan.org
From: "William Zorn" <eewill40z [...] gmail.com>
First off, I am using the most recent version and second I made a mistake. The failure is not on the 'last if...' line. The code hangs on the while loop comparision. It actually runs and find every occurence of the string replace until there is none left. It hangs on the last compair. Instead of not finding the string and coming back to exit the loop, it just hangs and eats up process time. I have tried it on Mandriva and RedHat (Linux) with the same result. My only guess is that this problem is due to the differences in how the operating systems handle characters in the file. I think I've debugged this as far as I can... I will keep looking at it, but I could sure use some help. Thanks! -William On 11/15/07, grantg@spamarrest.com via RT <bug-WWW-Myspace@rt.cpan.org> wrote: Show quoted text
> > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Please upgrade to the current version of WWW::Myspace. > > The loop will exit when all the messages have been parsed - the $page > variable is modified through each loop. If the RE being matched > against $page doesn't match (i.e. due to a myspace change), the loop > will exit anyway. Thus, this most likely isn't the cause of the > problem. > > There is a known issue, however, that could cause an endless loop in > get_inbox if the paging isn't working properly (i.e. if it tries to go > to the next page, but gets the same one instead). The method works > for me however (on MacOS 10.5). > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: >
> > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > Transaction: Ticket created by eewill40z@gmail.com > > Queue: WWW-Myspace > > Subject: Get_inbox hang, ref bug 30351 > > Broken in: (no value) > > Severity: (no value) > > Owner: Nobody > > Requestors: eewill40z@gmail.com > > Status: new > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > > > I'm not sure what the fix is, but this module hangs because the > > 'last if...' statement around line 3581 (in sub > > _get_messages_from_page) will never be evaluated as true. The hash > > key 'stop_at' does not appear to exist in the %options hash. Same > > goes with the 'end_page' key into the %options hash located in the > > sub get_inbox. > > > >
> > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 09:51:33 -0600
To: bug-WWW-Myspace [...] rt.cpan.org
From: "William Zorn" <eewill40z [...] gmail.com>
Okay... this is not as eloquent as your solution, but it works for me.... Here's the new _get_messages_from_page. I'm assuming it will work on yours to. Would love for you to try it out and let me know. -William # Updated by WZorn to fix hanging problem on Mandriva and RetHat linux. sub _get_messages_from_page { my ( $dummy, %options ) = @_; my $page = $self->current_page->decoded_content; my @messages = (); my $state = 0; # State Values # 0 - Beginning state, looking for beginning of message block # 1 - In message block, looking for data # Will return to state=0 when we get the last data (messageID and subject) my $sender; my $status; my $msg_id; my $subject; open(my $fh, "<", \$page); while ( <$fh> ) { chomp; last if ( $options{'stop_at'} && ( $options{'stop_at'} == $3 ) ); if(/<td class="messageListCell" align="center"> /){ # Found beginning of Message block $state = 1; } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ $sender = $1; } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ $status = $1; } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ $msg_id = $1; $subject = $2; $state = 0; #return to state=0 because we need to start looking for the beginning of the next message block push @messages, { sender => $sender, status => $status, message_id => $msg_id, subject => $subject }; if ($DEBUG) { print $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; } } } return @messages; } On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT <bug-WWW-Myspace@rt.cpan.org> wrote: Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Please upgrade to the current version of WWW::Myspace. > > The loop will exit when all the messages have been parsed - the $page > variable is modified through each loop. If the RE being matched > against $page doesn't match (i.e. due to a myspace change), the loop > will exit anyway. Thus, this most likely isn't the cause of the > problem. > > There is a known issue, however, that could cause an endless loop in > get_inbox if the paging isn't working properly (i.e. if it tries to go > to the next page, but gets the same one instead). The method works > for me however (on MacOS 10.5). > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: >
> > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > Transaction: Ticket created by eewill40z@gmail.com > > Queue: WWW-Myspace > > Subject: Get_inbox hang, ref bug 30351 > > Broken in: (no value) > > Severity: (no value) > > Owner: Nobody > > Requestors: eewill40z@gmail.com > > Status: new > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
>
> > > > > > I'm not sure what the fix is, but this module hangs because the > > 'last if...' statement around line 3581 (in sub > > _get_messages_from_page) will never be evaluated as true. The hash > > key 'stop_at' does not appear to exist in the %options hash. Same > > goes with the 'end_page' key into the %options hash located in the > > sub get_inbox. > > > >
> > >
From: GRANTG [...] cpan.org
Hi William, Thanks for the patch. Does get_comments work for you? The same sort of processing is done there. Unfortunately, this patch is subject to break if myspace makes a minute change in the way they format lines (which they do on a regular basis). I'd like to figure out why the regex is hanging certain systems. Thanks, Grant On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote: Show quoted text
> Okay... this is not as eloquent as your solution, but it works for > me.... Here's the new _get_messages_from_page. I'm assuming it will > work on yours to. Would love for you to try it out and let me know. > > -William > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat linux. > sub _get_messages_from_page { > > my ( $dummy, %options ) = @_; > my $page = $self->current_page->decoded_content; > my @messages = (); > my $state = 0; # State Values > # 0 - Beginning state, looking for beginning of message block > # 1 - In message block, looking for data > # Will return to state=0 when we get the last data (messageID > and subject) > my $sender; > my $status; > my $msg_id; > my $subject; > > open(my $fh, "<", \$page); > while ( <$fh> ) { > chomp; > last if ( $options{'stop_at'} && ( $options{'stop_at'} == $3 ) ); > if(/<td class="messageListCell" align="center"> > /){ > # Found beginning of Message block > $state = 1; > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > $sender = $1; > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > $status = $1; > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > $msg_id = $1; > $subject = $2; > $state = 0; #return to state=0 because we need to start looking > for the beginning of the next message block > > push @messages, { sender => $sender, status => $status, message_id > => $msg_id, subject => $subject }; > if ($DEBUG) { print $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; } > } > } > return @messages; > } > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > <bug-WWW-Myspace@rt.cpan.org> wrote:
> > > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > Please upgrade to the current version of WWW::Myspace. > > > > The loop will exit when all the messages have been parsed - the $page > > variable is modified through each loop. If the RE being matched > > against $page doesn't match (i.e. due to a myspace change), the loop > > will exit anyway. Thus, this most likely isn't the cause of the > > problem. > > > > There is a known issue, however, that could cause an endless loop in > > get_inbox if the paging isn't working properly (i.e. if it tries to go > > to the next page, but gets the same one instead). The method works > > for me however (on MacOS 10.5). > > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > >
> > > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > > Transaction: Ticket created by eewill40z@gmail.com > > > Queue: WWW-Myspace > > > Subject: Get_inbox hang, ref bug 30351 > > > Broken in: (no value) > > > Severity: (no value) > > > Owner: Nobody > > > Requestors: eewill40z@gmail.com > > > Status: new > > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >
> > > > > > > > > I'm not sure what the fix is, but this module hangs because the > > > 'last if...' statement around line 3581 (in sub > > > _get_messages_from_page) will never be evaluated as true. The hash > > > key 'stop_at' does not appear to exist in the %options hash. Same > > > goes with the 'end_page' key into the %options hash located in the > > > sub get_inbox. > > > > > >
> > > > > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 10:31:40 -0600
To: bug-WWW-Myspace [...] rt.cpan.org
From: "William Zorn" <eewill40z [...] gmail.com>
I'm moving on to that one next. I think it is breaking because of some strange character that shows up in the string. I've seen something like this happen when you try to use a DOS file on UNIX with perl. I also get an "Extremely long character" message when I'm sending the output to a file via the ">" on the command line. Can't tell you what it's complaining about, but my guess is that might be the problem. I just don't know how to fix it. Maybe, if you have some UNIX development contacts you could get them to look at it. I'm basically an amature, but I love to mess around with perl. -William On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote: Show quoted text
> > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Hi William, > > Thanks for the patch. Does get_comments work for you? The same sort of > processing is > done there. Unfortunately, this patch is subject to break if myspace > makes a minute change > in the way they format lines (which they do on a regular basis). I'd like > to figure out why the > regex is hanging certain systems. > > Thanks, > > Grant > > On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> > Okay... this is not as eloquent as your solution, but it works for > > me.... Here's the new _get_messages_from_page. I'm assuming it will > > work on yours to. Would love for you to try it out and let me know. > > > > -William > > > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat linux. > > sub _get_messages_from_page { > > > > my ( $dummy, %options ) = @_; > > my $page = $self->current_page->decoded_content; > > my @messages = (); > > my $state = 0; # State Values > > # 0 - Beginning state, looking for
> beginning of message block
> > # 1 - In message block, looking
> for data
> > # Will return to state=0 when
> we get the last data (messageID
> > and subject) > > my $sender; > > my $status; > > my $msg_id; > > my $subject; > > > > open(my $fh, "<", \$page); > > while ( <$fh> ) { > > chomp; > > last if ( $options{'stop_at'} && ( $options{'stop_at'} == $3 )
> );
> > if(/<td class="messageListCell" align="center"> > > /){ > > # Found beginning of Message block > > $state = 1; > > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > > $sender = $1; > > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > > $status = $1; > > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > > $msg_id = $1; > > $subject = $2; > > $state = 0; #return to state=0 because we need to start
> looking
> > for the beginning of the next message block > > > > push @messages, { sender => $sender, status => $status,
> message_id
> > => $msg_id, subject => $subject }; > > if ($DEBUG) { print
> $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; }
> > } > > } > > return @messages; > > } > > > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > > <bug-WWW-Myspace@rt.cpan.org> wrote:
> > > > > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > > > Please upgrade to the current version of WWW::Myspace. > > > > > > The loop will exit when all the messages have been parsed - the $page > > > variable is modified through each loop. If the RE being matched > > > against $page doesn't match (i.e. due to a myspace change), the loop > > > will exit anyway. Thus, this most likely isn't the cause of the > > > problem. > > > > > > There is a known issue, however, that could cause an endless loop in > > > get_inbox if the paging isn't working properly (i.e. if it tries to go > > > to the next page, but gets the same one instead). The method works > > > for me however (on MacOS 10.5). > > > > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > > >
> > > > > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > > > Transaction: Ticket created by eewill40z@gmail.com > > > > Queue: WWW-Myspace > > > > Subject: Get_inbox hang, ref bug 30351 > > > > Broken in: (no value) > > > > Severity: (no value) > > > > Owner: Nobody > > > > Requestors: eewill40z@gmail.com > > > > Status: new > > > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> > >
> > > > > > > > > > > > I'm not sure what the fix is, but this module hangs because the > > > > 'last if...' statement around line 3581 (in sub > > > > _get_messages_from_page) will never be evaluated as true. The hash > > > > key 'stop_at' does not appear to exist in the %options hash. Same > > > > goes with the 'end_page' key into the %options hash located in the > > > > sub get_inbox. > > > > > > > >
> > > > > > > > >
> > > > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 14:27:56 -0600
To: bug-WWW-Myspace [...] rt.cpan.org
From: "William Zorn" <eewill40z [...] gmail.com>
Of course I'd get it wrong... USE THIS ONE.... IT IS VERIFIED ON MY SYSTEM... CHECK ON YOURS. sub get_comments { shift; my ( $friend_id ) = @_; my @comments = (); my $url="http://comment.myspace.com/index.cfm?fuseaction=user.viewComments&friendID=". $friend_id; #my $eventtarget='ctl00$Main$PagedComments$pagingNavigation1$NextLinkButton'; my $eventtarget='ctl00$cpMain$PagedComments$pagerTop'; # WZ 11162007 - Changed from the above comment my $eventvalidation; my $viewstate; my $page=""; my $commentcount; $self->_die_unless_logged_in( 'get_comments' ); # only get a maximum of 50 comment pages # this should translate to 2500 comments # and also serves as a safety measure in case # the method breaks again ( $DEBUG ) && print "Getting $url\n"; $page = $self->get_page( $url ); #raise an error if its private #if($self->is_private(page => $page)) { #$self->error("cannot get comments from private profile"); #return undef; #} # find out how many comments in total if ($page->decoded_content =~ /.*Listing [\d-]+ of (\d+).*/smo){ $commentcount=$1; } else { $self->error("Could not find how many comments are on profile"); return undef; } for (my $i=1;$i<=50;$i++) { $page=$self->{current_page}; push @comments, $self->_get_comments_from_page( $page->decoded_content ); #make sure we did not get an error return undef if ($self->error); last unless ( $self->_next_button( $page->decoded_content ) ); # WZ 11162007 - Comment out, __EVENTVALIDATION does not exist in the page. #get value of form field eventvalidation # if ($page->decoded_content =~ /id=\"__EVENTVALIDATION\" value=\"(.*?)\"/o){ # $eventvalidation=$1; # } # else { # $self->error("get_comments could not determine eventvalidation in form"); # return undef; # } # #get value of form field viewstate if ($page->decoded_content =~ /id=\"__VIEWSTATE\" value=\"(.*?)\"/o){ $viewstate=$1; } else{ $self->error("get_comments could not determine viewstate in form"); return undef; } #create a form using these values my $htmlform=qq{<form name="aspnetForm" method="post" action="/index.cfm?fuseaction=user.viewComments&amp;friendID=$friend_id" id="aspnetForm">}. qq{<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="$eventtarget" />}. qq{<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="$i" />}. #WZ 11162007 - Changed to include $i as the value for __EVENTARGUMENT qq{<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="$viewstate" />}. #qq{<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="$eventvalidation" />}. qq{</form>}; my $form=HTML::Form->parse($htmlform,"http://comment.myspace.com/index.cfm"); ( $DEBUG ) && print "try to submit form to access comments page #",$i,"\n"; #submit it and hope for the best $self->submit_form({form => $form,no_click=> 1,follow=>0}); #submit the form to get to next page #$self->submit_form({ # follow => 0, # form_name => "aspnetForm", # no_click => 1, # fields_ref => { __EVENTTARGET => $eventtarget, __EVENTARGUMENT => '' } # #re1 => 'something unique.?about this[ \t\n]+page', # }); # sleep ( int( rand( 2 ) ) + 1 ); } # unless(scalar (@comments) == $commentcount){ # $self->error("Could not collect all comments. Have " . @comments .", should have $commentcount"); # return undef; # } return \@comments; } On Nov 16, 2007 10:13 AM, via RT <bug-WWW-Myspace@rt.cpan.org> wrote: Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Hi William, > > Thanks for the patch. Does get_comments work for you? The same sort of processing is > done there. Unfortunately, this patch is subject to break if myspace makes a minute change > in the way they format lines (which they do on a regular basis). I'd like to figure out why the > regex is hanging certain systems. > > Thanks, > > Grant > > > On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> > Okay... this is not as eloquent as your solution, but it works for > > me.... Here's the new _get_messages_from_page. I'm assuming it will > > work on yours to. Would love for you to try it out and let me know. > > > > -William > > > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat linux. > > sub _get_messages_from_page { > > > > my ( $dummy, %options ) = @_; > > my $page = $self->current_page->decoded_content; > > my @messages = (); > > my $state = 0; # State Values > > # 0 - Beginning state, looking for beginning of message block > > # 1 - In message block, looking for data > > # Will return to state=0 when we get the last data (messageID > > and subject) > > my $sender; > > my $status; > > my $msg_id; > > my $subject; > > > > open(my $fh, "<", \$page); > > while ( <$fh> ) { > > chomp; > > last if ( $options{'stop_at'} && ( $options{'stop_at'} == $3 ) ); > > if(/<td class="messageListCell" align="center"> > > /){ > > # Found beginning of Message block > > $state = 1; > > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > > $sender = $1; > > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > > $status = $1; > > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > > $msg_id = $1; > > $subject = $2; > > $state = 0; #return to state=0 because we need to start looking > > for the beginning of the next message block > > > > push @messages, { sender => $sender, status => $status, message_id > > => $msg_id, subject => $subject }; > > if ($DEBUG) { print $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; } > > } > > } > > return @messages; > > } > > > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > > <bug-WWW-Myspace@rt.cpan.org> wrote:
> > > > > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > > > Please upgrade to the current version of WWW::Myspace. > > > > > > The loop will exit when all the messages have been parsed - the $page > > > variable is modified through each loop. If the RE being matched > > > against $page doesn't match (i.e. due to a myspace change), the loop > > > will exit anyway. Thus, this most likely isn't the cause of the > > > problem. > > > > > > There is a known issue, however, that could cause an endless loop in > > > get_inbox if the paging isn't working properly (i.e. if it tries to go > > > to the next page, but gets the same one instead). The method works > > > for me however (on MacOS 10.5). > > > > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > > >
> > > > > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > > > Transaction: Ticket created by eewill40z@gmail.com > > > > Queue: WWW-Myspace > > > > Subject: Get_inbox hang, ref bug 30351 > > > > Broken in: (no value) > > > > Severity: (no value) > > > > Owner: Nobody > > > > Requestors: eewill40z@gmail.com > > > > Status: new > > > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> > >
> > > > > > > > > > > > I'm not sure what the fix is, but this module hangs because the > > > > 'last if...' statement around line 3581 (in sub > > > > _get_messages_from_page) will never be evaluated as true. The hash > > > > key 'stop_at' does not appear to exist in the %options hash. Same > > > > goes with the 'end_page' key into the %options hash located in the > > > > sub get_inbox. > > > > > > > >
> > > > > > > > >
> > > > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 14:09:13 -0600
To: bug-WWW-Myspace [...] rt.cpan.org
From: "William Zorn" <eewill40z [...] gmail.com>
Get comments did not work for me, but for a different reason. Does this mean that myspace changed some stuff? The eventtarget was differnet and the __EVENTVALIDATION no longer existed in the page. Did not have the same problem with the _get_comments... sub as the _get_messages... sub. So that was okay. sub get_comments { shift; my ( $friend_id ) = @_; my @comments = (); my $url="http://comment.myspace.com/index.cfm?fuseaction=user.viewComments&friendID=". $friend_id; #my $eventtarget='ctl00$Main$PagedComments$pagingNavigation1$NextLinkButton'; my $eventtarget='ctl00$cpMain$PagedComments$pagerTop'; # WZ 11162007 - Changed from the above comment my $eventvalidation; my $viewstate; my $page=""; my $next_i=0; #WZ 11162007 - Added and initialized to zero (next page number) my $commentcount; $self->_die_unless_logged_in( 'get_comments' ); # only get a maximum of 50 comment pages # this should translate to 2500 comments # and also serves as a safety measure in case # the method breaks again ( $DEBUG ) && print "Getting $url\n"; $page = $self->get_page( $url ); #raise an error if its private #if($self->is_private(page => $page)) { #$self->error("cannot get comments from private profile"); #return undef; #} # find out how many comments in total if ($page->decoded_content =~ /.*Listing [\d-]+ of (\d+).*/smo){ $commentcount=$1; } else { $self->error("Could not find how many comments are on profile"); return undef; } for (my $i=1;$i<=50;$i++) { $page=$self->{current_page}; $next_i = $i++; #WZ 11162007 - increment i to get next page number push @comments, $self->_get_comments_from_page( $page->decoded_content ); #make sure we did not get an error return undef if ($self->error); last unless ( $self->_next_button( $page->decoded_content ) ); # WZ 11162007 - Comment out, __EVENTVALIDATION does not exist in the page. #get value of form field eventvalidation # if ($page->decoded_content =~ /id=\"__EVENTVALIDATION\" value=\"(.*?)\"/o){ # $eventvalidation=$1; # } # else { # $self->error("get_comments could not determine eventvalidation in form"); # return undef; # } # #get value of form field viewstate if ($page->decoded_content =~ /id=\"__VIEWSTATE\" value=\"(.*?)\"/o){ $viewstate=$1; } else{ $self->error("get_comments could not determine viewstate in form"); return undef; } #create a form using these values my $htmlform=qq{<form name="aspnetForm" method="post" action="/index.cfm?fuseaction=user.viewComments&amp;friendID=$friend_id" id="aspnetForm">}. qq{<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="$eventtarget" />}. qq{<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="$next_i" />}. #WZ 11162007 - Changed to include $next_i as the value for __EVENTARGUMENT qq{<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="$viewstate" />}. #qq{<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="$eventvalidation" />}. qq{</form>}; my $form=HTML::Form->parse($htmlform,"http://comment.myspace.com/index.cfm"); ( $DEBUG ) && print "try to submit form to access comments page #",$i+1,"\n"; #submit it and hope for the best $self->submit_form({form => $form,no_click=> 1,follow=>0}); #submit the form to get to next page #$self->submit_form({ # follow => 0, # form_name => "aspnetForm", # no_click => 1, # fields_ref => { __EVENTTARGET => $eventtarget, __EVENTARGUMENT => '' } # #re1 => 'something unique.?about this[ \t\n]+page', # }); # sleep ( int( rand( 2 ) ) + 1 ); } # unless(scalar (@comments) == $commentcount){ # $self->error("Could not collect all comments. Have " . @comments .", should have $commentcount"); # return undef; # } return \@comments; } On Nov 16, 2007 10:13 AM, via RT <bug-WWW-Myspace@rt.cpan.org> wrote: Show quoted text
> > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Hi William, > > Thanks for the patch. Does get_comments work for you? The same sort of processing is > done there. Unfortunately, this patch is subject to break if myspace makes a minute change > in the way they format lines (which they do on a regular basis). I'd like to figure out why the > regex is hanging certain systems. > > Thanks, > > Grant > > > On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> > Okay... this is not as eloquent as your solution, but it works for > > me.... Here's the new _get_messages_from_page. I'm assuming it will > > work on yours to. Would love for you to try it out and let me know. > > > > -William > > > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat linux. > > sub _get_messages_from_page { > > > > my ( $dummy, %options ) = @_; > > my $page = $self->current_page->decoded_content; > > my @messages = (); > > my $state = 0; # State Values > > # 0 - Beginning state, looking for beginning of message block > > # 1 - In message block, looking for data > > # Will return to state=0 when we get the last data (messageID > > and subject) > > my $sender; > > my $status; > > my $msg_id; > > my $subject; > > > > open(my $fh, "<", \$page); > > while ( <$fh> ) { > > chomp; > > last if ( $options{'stop_at'} && ( $options{'stop_at'} == $3 ) ); > > if(/<td class="messageListCell" align="center"> > > /){ > > # Found beginning of Message block > > $state = 1; > > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > > $sender = $1; > > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > > $status = $1; > > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > > $msg_id = $1; > > $subject = $2; > > $state = 0; #return to state=0 because we need to start looking > > for the beginning of the next message block > > > > push @messages, { sender => $sender, status => $status, message_id > > => $msg_id, subject => $subject }; > > if ($DEBUG) { print $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; } > > } > > } > > return @messages; > > } > > > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > > <bug-WWW-Myspace@rt.cpan.org> wrote:
> > > > > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > > > Please upgrade to the current version of WWW::Myspace. > > > > > > The loop will exit when all the messages have been parsed - the $page > > > variable is modified through each loop. If the RE being matched > > > against $page doesn't match (i.e. due to a myspace change), the loop > > > will exit anyway. Thus, this most likely isn't the cause of the > > > problem. > > > > > > There is a known issue, however, that could cause an endless loop in > > > get_inbox if the paging isn't working properly (i.e. if it tries to go > > > to the next page, but gets the same one instead). The method works > > > for me however (on MacOS 10.5). > > > > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > > >
> > > > > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > > > Transaction: Ticket created by eewill40z@gmail.com > > > > Queue: WWW-Myspace > > > > Subject: Get_inbox hang, ref bug 30351 > > > > Broken in: (no value) > > > > Severity: (no value) > > > > Owner: Nobody > > > > Requestors: eewill40z@gmail.com > > > > Status: new > > > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> > >
> > > > > > > > > > > > I'm not sure what the fix is, but this module hangs because the > > > > 'last if...' statement around line 3581 (in sub > > > > _get_messages_from_page) will never be evaluated as true. The hash > > > > key 'stop_at' does not appear to exist in the %options hash. Same > > > > goes with the 'end_page' key into the %options hash located in the > > > > sub get_inbox. > > > > > > > >
> > > > > > > > >
> > > > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 13:23:56 -0800
To: bug-WWW-Myspace [...] rt.cpan.org
From: Grant Grueninger <grantg [...] spamarrest.com>
Hi William, I mentioned this in the duplicate bug report also - try updating to the most recent WWW::Mechanize and HTML::Parser (if you're not already running them). Steve (bug#30351) thinks the regex parser might be hanging on malformed UTF-8 characters (which is probably what you're seeing with the "extremely long character" error. I want to see if HTML::Parser and/or WWW::Mechanize may have updated something that fixes that. On Nov 16, 2007, at 8:38 AM, William Zorn via RT wrote: Show quoted text
> > Queue: WWW-Myspace > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > I'm moving on to that one next. > > I think it is breaking because of some strange character that shows > up in > the string. I've seen something like this happen when you try to > use a DOS > file on UNIX with perl. I also get an "Extremely long character" > message when I'm sending the output to a file via the ">" on the > command > line. Can't tell you what it's complaining about, but my guess is > that > might be the problem. I just don't know how to fix it. Maybe, if > you have > some UNIX development contacts you could get them to look at it. I'm > basically an amature, but I love to mess around with perl. > > -William > > > On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote:
>> >> >> <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > >> >> Hi William, >> >> Thanks for the patch. Does get_comments work for you? The same >> sort of >> processing is >> done there. Unfortunately, this patch is subject to break if myspace >> makes a minute change >> in the way they format lines (which they do on a regular basis). >> I'd like >> to figure out why the >> regex is hanging certain systems. >> >> Thanks, >> >> Grant >> >> On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
>>> Okay... this is not as eloquent as your solution, but it works for >>> me.... Here's the new _get_messages_from_page. I'm assuming it will >>> work on yours to. Would love for you to try it out and let me know. >>> >>> -William >>> >>> # Updated by WZorn to fix hanging problem on Mandriva and RetHat >>> linux. >>> sub _get_messages_from_page { >>> >>> my ( $dummy, %options ) = @_; >>> my $page = $self->current_page->decoded_content; >>> my @messages = (); >>> my $state = 0; # State Values >>> # 0 - Beginning state, >>> looking for
>> beginning of message block
>>> # 1 - In message block, looking
>> for data
>>> # Will return to state=0 >>> when
>> we get the last data (messageID
>>> and subject) >>> my $sender; >>> my $status; >>> my $msg_id; >>> my $subject; >>> >>> open(my $fh, "<", \$page); >>> while ( <$fh> ) { >>> chomp; >>> last if ( $options{'stop_at'} && ( $options{'stop_at'} == >>> $3 )
>> );
>>> if(/<td class="messageListCell" align="center"> >>> /){ >>> # Found beginning of Message block >>> $state = 1; >>> } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ >>> $sender = $1; >>> } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ >>> $status = $1; >>> } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ >>> $msg_id = $1; >>> $subject = $2; >>> $state = 0; #return to state=0 because we need to >>> start
>> looking
>>> for the beginning of the next message block >>> >>> push @messages, { sender => $sender, status => $status,
>> message_id
>>> => $msg_id, subject => $subject }; >>> if ($DEBUG) { print
>> $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; }
>>> } >>> } >>> return @messages; >>> } >>> >>> On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT >>> <bug-WWW-Myspace@rt.cpan.org> wrote:
>>>> >>>> <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > >>>> >>>> Please upgrade to the current version of WWW::Myspace. >>>> >>>> The loop will exit when all the messages have been parsed - the >>>> $page >>>> variable is modified through each loop. If the RE being matched >>>> against $page doesn't match (i.e. due to a myspace change), the >>>> loop >>>> will exit anyway. Thus, this most likely isn't the cause of the >>>> problem. >>>> >>>> There is a known issue, however, that could cause an endless loop >>>> in >>>> get_inbox if the paging isn't working properly (i.e. if it tries >>>> to go >>>> to the next page, but gets the same one instead). The method works >>>> for me however (on MacOS 10.5). >>>> >>>> On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: >>>>
>>>>> >>>>> Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. >>>>> Transaction: Ticket created by eewill40z@gmail.com >>>>> Queue: WWW-Myspace >>>>> Subject: Get_inbox hang, ref bug 30351 >>>>> Broken in: (no value) >>>>> Severity: (no value) >>>>> Owner: Nobody >>>>> Requestors: eewill40z@gmail.com >>>>> Status: new >>>>> Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
>>>>
>>>>> >>>>> >>>>> I'm not sure what the fix is, but this module hangs because the >>>>> 'last if...' statement around line 3581 (in sub >>>>> _get_messages_from_page) will never be evaluated as true. The hash >>>>> key 'stop_at' does not appear to exist in the %options hash. Same >>>>> goes with the 'end_page' key into the %options hash located in the >>>>> sub get_inbox. >>>>> >>>>>
>>>> >>>> >>>>
>> >> >> >> >>
> > I'm moving on to that one next. > > I think it is breaking because of some strange character that shows > up in the string. I've seen something like this happen when you try > to use a DOS file on UNIX with perl. I also get an "Extremely long > character" message when I'm sending the output to a file via the ">" > on the command line. Can't tell you what it's complaining about, > but my guess is that might be the problem. I just don't know how to > fix it. Maybe, if you have some UNIX development contacts you could > get them to look at it. I'm basically an amature, but I love to > mess around with perl. > > -William > > > On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote: > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Hi William, > > Thanks for the patch. Does get_comments work for you? The same > sort of processing is > done there. Unfortunately, this patch is subject to break if > myspace makes a minute change > in the way they format lines (which they do on a regular basis). > I'd like to figure out why the > regex is hanging certain systems. > > Thanks, > > Grant > > On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> > Okay... this is not as eloquent as your solution, but it works for > > me.... Here's the new _get_messages_from_page. I'm assuming it will > > work on yours to. Would love for you to try it out and let me know. > > > > -William > > > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat
> linux.
> > sub _get_messages_from_page { > > > > my ( $dummy, %options ) = @_; > > my $page = $self->current_page->decoded_content; > > my @messages = (); > > my $state = 0; # State Values > > # 0 - Beginning state,
> looking for beginning of message block
> > # 1 - In message block,
> looking for data
> > # Will return to state=0
> when we get the last data (messageID
> > and subject) > > my $sender; > > my $status; > > my $msg_id; > > my $subject; > > > > open(my $fh, "<", \$page); > > while ( <$fh> ) { > > chomp; > > last if ( $options{'stop_at'} && ( $options{'stop_at'} ==
> $3 ) );
> > if(/<td class="messageListCell" align="center"> > > /){ > > # Found beginning of Message block > > $state = 1; > > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > > $sender = $1; > > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > > $status = $1; > > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > > $msg_id = $1; > > $subject = $2; > > $state = 0; #return to state=0 because we need to
> start looking
> > for the beginning of the next message block > > > > push @messages, { sender => $sender, status =>
> $status, message_id
> > => $msg_id, subject => $subject }; > > if ($DEBUG) { print $sender,"|",$status,"|",
> $msg_id,"|",$subject,"\n"; }
> > } > > } > > return @messages; > > } > > > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > > < bug-WWW-Myspace@rt.cpan.org> wrote:
> > > > > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > > > Please upgrade to the current version of WWW::Myspace. > > > > > > The loop will exit when all the messages have been parsed - the
> $page
> > > variable is modified through each loop. If the RE being matched > > > against $page doesn't match ( i.e. due to a myspace change), the
> loop
> > > will exit anyway. Thus, this most likely isn't the cause of the > > > problem. > > > > > > There is a known issue, however, that could cause an endless
> loop in
> > > get_inbox if the paging isn't working properly (i.e. if it tries
> to go
> > > to the next page, but gets the same one instead). The method
> works
> > > for me however (on MacOS 10.5). > > > > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > > >
> > > > > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > > > Transaction: Ticket created by eewill40z@gmail.com > > > > Queue: WWW-Myspace > > > > Subject: Get_inbox hang, ref bug 30351 > > > > Broken in: (no value) > > > > Severity: (no value) > > > > Owner: Nobody > > > > Requestors: eewill40z@gmail.com > > > > Status: new > > > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> > >
> > > > > > > > > > > > I'm not sure what the fix is, but this module hangs because the > > > > 'last if...' statement around line 3581 (in sub > > > > _get_messages_from_page) will never be evaluated as true. The
> hash
> > > > key 'stop_at' does not appear to exist in the %options hash.
> Same
> > > > goes with the 'end_page' key into the %options hash located in
> the
> > > > sub get_inbox. > > > > > > > >
> > > > > > > > >
> > > > >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 15:29:51 -0600
To: bug-WWW-Myspace [...] rt.cpan.org
From: "William Zorn" <eewill40z [...] gmail.com>
Nope... I downloaded the latest ones last night and saw the problem. On 11/16/07, grantg@spamarrest.com via RT <bug-WWW-Myspace@rt.cpan.org> wrote: Show quoted text
> > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Hi William, > > I mentioned this in the duplicate bug report also - try updating to > the most recent WWW::Mechanize and HTML::Parser (if you're not already > running them). Steve (bug#30351) thinks the regex parser might be > hanging on malformed UTF-8 characters (which is probably what you're > seeing with the "extremely long character" error. I want to see if > HTML::Parser and/or WWW::Mechanize may have updated something that > fixes that. > > On Nov 16, 2007, at 8:38 AM, William Zorn via RT wrote: >
> > > > Queue: WWW-Myspace > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > I'm moving on to that one next. > > > > I think it is breaking because of some strange character that shows > > up in > > the string. I've seen something like this happen when you try to > > use a DOS > > file on UNIX with perl. I also get an "Extremely long character" > > message when I'm sending the output to a file via the ">" on the > > command > > line. Can't tell you what it's complaining about, but my guess is > > that > > might be the problem. I just don't know how to fix it. Maybe, if > > you have > > some UNIX development contacts you could get them to look at it. I'm > > basically an amature, but I love to mess around with perl. > > > > -William > > > > > > On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote:
> >> > >> > >> <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > >> > >> Hi William, > >> > >> Thanks for the patch. Does get_comments work for you? The same > >> sort of > >> processing is > >> done there. Unfortunately, this patch is subject to break if myspace > >> makes a minute change > >> in the way they format lines (which they do on a regular basis). > >> I'd like > >> to figure out why the > >> regex is hanging certain systems. > >> > >> Thanks, > >> > >> Grant > >> > >> On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> >>> Okay... this is not as eloquent as your solution, but it works for > >>> me.... Here's the new _get_messages_from_page. I'm assuming it will > >>> work on yours to. Would love for you to try it out and let me know. > >>> > >>> -William > >>> > >>> # Updated by WZorn to fix hanging problem on Mandriva and RetHat > >>> linux. > >>> sub _get_messages_from_page { > >>> > >>> my ( $dummy, %options ) = @_; > >>> my $page = $self->current_page->decoded_content; > >>> my @messages = (); > >>> my $state = 0; # State Values > >>> # 0 - Beginning state, > >>> looking for
> >> beginning of message block
> >>> # 1 - In message block, looking
> >> for data
> >>> # Will return to state=0 > >>> when
> >> we get the last data (messageID
> >>> and subject) > >>> my $sender; > >>> my $status; > >>> my $msg_id; > >>> my $subject; > >>> > >>> open(my $fh, "<", \$page); > >>> while ( <$fh> ) { > >>> chomp; > >>> last if ( $options{'stop_at'} && ( $options{'stop_at'} == > >>> $3 )
> >> );
> >>> if(/<td class="messageListCell" align="center"> > >>> /){ > >>> # Found beginning of Message block > >>> $state = 1; > >>> } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > >>> $sender = $1; > >>> } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > >>> $status = $1; > >>> } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > >>> $msg_id = $1; > >>> $subject = $2; > >>> $state = 0; #return to state=0 because we need to > >>> start
> >> looking
> >>> for the beginning of the next message block > >>> > >>> push @messages, { sender => $sender, status => $status,
> >> message_id
> >>> => $msg_id, subject => $subject }; > >>> if ($DEBUG) { print
> >> $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; }
> >>> } > >>> } > >>> return @messages; > >>> } > >>> > >>> On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > >>> <bug-WWW-Myspace@rt.cpan.org> wrote:
> >>>> > >>>> <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > >>>> > >>>> Please upgrade to the current version of WWW::Myspace. > >>>> > >>>> The loop will exit when all the messages have been parsed - the > >>>> $page > >>>> variable is modified through each loop. If the RE being matched > >>>> against $page doesn't match (i.e. due to a myspace change), the > >>>> loop > >>>> will exit anyway. Thus, this most likely isn't the cause of the > >>>> problem. > >>>> > >>>> There is a known issue, however, that could cause an endless loop > >>>> in > >>>> get_inbox if the paging isn't working properly (i.e. if it tries > >>>> to go > >>>> to the next page, but gets the same one instead). The method works > >>>> for me however (on MacOS 10.5). > >>>> > >>>> On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > >>>>
> >>>>> > >>>>> Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > >>>>> Transaction: Ticket created by eewill40z@gmail.com > >>>>> Queue: WWW-Myspace > >>>>> Subject: Get_inbox hang, ref bug 30351 > >>>>> Broken in: (no value) > >>>>> Severity: (no value) > >>>>> Owner: Nobody > >>>>> Requestors: eewill40z@gmail.com > >>>>> Status: new > >>>>> Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >>>>
> >>>>> > >>>>> > >>>>> I'm not sure what the fix is, but this module hangs because the > >>>>> 'last if...' statement around line 3581 (in sub > >>>>> _get_messages_from_page) will never be evaluated as true. The hash > >>>>> key 'stop_at' does not appear to exist in the %options hash. Same > >>>>> goes with the 'end_page' key into the %options hash located in the > >>>>> sub get_inbox. > >>>>> > >>>>>
> >>>> > >>>> > >>>>
> >> > >> > >> > >> > >>
> > > > I'm moving on to that one next. > > > > I think it is breaking because of some strange character that shows > > up in the string. I've seen something like this happen when you try > > to use a DOS file on UNIX with perl. I also get an "Extremely long > > character" message when I'm sending the output to a file via the ">" > > on the command line. Can't tell you what it's complaining about, > > but my guess is that might be the problem. I just don't know how to > > fix it. Maybe, if you have some UNIX development contacts you could > > get them to look at it. I'm basically an amature, but I love to > > mess around with perl. > > > > -William > > > > > > On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote: > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > Hi William, > > > > Thanks for the patch. Does get_comments work for you? The same > > sort of processing is > > done there. Unfortunately, this patch is subject to break if > > myspace makes a minute change > > in the way they format lines (which they do on a regular basis). > > I'd like to figure out why the > > regex is hanging certain systems. > > > > Thanks, > > > > Grant > > > > On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> > > Okay... this is not as eloquent as your solution, but it works for > > > me.... Here's the new _get_messages_from_page. I'm assuming it will > > > work on yours to. Would love for you to try it out and let me know. > > > > > > -William > > > > > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat
> > linux.
> > > sub _get_messages_from_page { > > > > > > my ( $dummy, %options ) = @_; > > > my $page = $self->current_page->decoded_content; > > > my @messages = (); > > > my $state = 0; # State Values > > > # 0 - Beginning state,
> > looking for beginning of message block
> > > # 1 - In message block,
> > looking for data
> > > # Will return to state=0
> > when we get the last data (messageID
> > > and subject) > > > my $sender; > > > my $status; > > > my $msg_id; > > > my $subject; > > > > > > open(my $fh, "<", \$page); > > > while ( <$fh> ) { > > > chomp; > > > last if ( $options{'stop_at'} && ( $options{'stop_at'} ==
> > $3 ) );
> > > if(/<td class="messageListCell" align="center"> > > > /){ > > > # Found beginning of Message block > > > $state = 1; > > > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ > > > $sender = $1; > > > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ > > > $status = $1; > > > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ > > > $msg_id = $1; > > > $subject = $2; > > > $state = 0; #return to state=0 because we need to
> > start looking
> > > for the beginning of the next message block > > > > > > push @messages, { sender => $sender, status =>
> > $status, message_id
> > > => $msg_id, subject => $subject }; > > > if ($DEBUG) { print $sender,"|",$status,"|",
> > $msg_id,"|",$subject,"\n"; }
> > > } > > > } > > > return @messages; > > > } > > > > > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT > > > < bug-WWW-Myspace@rt.cpan.org> wrote:
> > > > > > > > <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > > > > > > > Please upgrade to the current version of WWW::Myspace. > > > > > > > > The loop will exit when all the messages have been parsed - the
> > $page
> > > > variable is modified through each loop. If the RE being matched > > > > against $page doesn't match ( i.e. due to a myspace change), the
> > loop
> > > > will exit anyway. Thus, this most likely isn't the cause of the > > > > problem. > > > > > > > > There is a known issue, however, that could cause an endless
> > loop in
> > > > get_inbox if the paging isn't working properly (i.e. if it tries
> > to go
> > > > to the next page, but gets the same one instead). The method
> > works
> > > > for me however (on MacOS 10.5). > > > > > > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: > > > >
> > > > > > > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. > > > > > Transaction: Ticket created by eewill40z@gmail.com > > > > > Queue: WWW-Myspace > > > > > Subject: Get_inbox hang, ref bug 30351 > > > > > Broken in: (no value) > > > > > Severity: (no value) > > > > > Owner: Nobody > > > > > Requestors: eewill40z@gmail.com > > > > > Status: new > > > > > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
> > > >
> > > > > > > > > > > > > > > I'm not sure what the fix is, but this module hangs because the > > > > > 'last if...' statement around line 3581 (in sub > > > > > _get_messages_from_page) will never be evaluated as true. The
> > hash
> > > > > key 'stop_at' does not appear to exist in the %options hash.
> > Same
> > > > > goes with the 'end_page' key into the %options hash located in
> > the
> > > > > sub get_inbox. > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > >
> > >

Message body is not shown because it is too large.

Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 13:43:21 -0800
To: bug-WWW-Myspace [...] rt.cpan.org
From: Grant Grueninger <grantg [...] spamarrest.com>
On Nov 16, 2007, at 12:36 PM, William Zorn via RT wrote: Show quoted text
> > Queue: WWW-Myspace > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Get comments did not work for me, but for a different reason. Does > this mean that myspace changed some stuff? The eventtarget was > differnet and the __EVENTVALIDATION no longer existed in the page.
It's quite possible they changed that again. I'll have to look into it.
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Fri, 16 Nov 2007 13:51:14 -0800
To: bug-WWW-Myspace [...] rt.cpan.org
From: Grant Grueninger <grantg [...] spamarrest.com>
Thanks William, You're submitting some useful fixes here - if you have SVN installed (or can install an svn client), read the "HOW TO SUBMIT A PATCH" section of the WWW::Myspace docs. That'll help us implement these faster, and will make sure you have the most recent build to test against (for example I've already fixed a minor bug in get_inbox since your patch :). Grant On Nov 16, 2007, at 12:28 PM, William Zorn via RT wrote: Show quoted text
> > Queue: WWW-Myspace > Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > > > Of course I'd get it wrong... USE THIS ONE.... IT IS VERIFIED ON MY > SYSTEM... CHECK ON YOURS. > > sub get_comments { > shift; > my ( $friend_id ) = @_; > my @comments = (); > my $url="http://comment.myspace.com/index.cfm?fuseaction=user.viewComments&friendID= > ". > $friend_id; > #my $eventtarget='ctl00$Main$PagedComments > $pagingNavigation1$NextLinkButton'; > my $eventtarget='ctl00$cpMain$PagedComments$pagerTop'; # WZ > 11162007 - Changed from the above comment > my $eventvalidation; > my $viewstate; > my $page=""; > my $commentcount; > > $self->_die_unless_logged_in( 'get_comments' ); > > # only get a maximum of 50 comment pages > # this should translate to 2500 comments > # and also serves as a safety measure in case > # the method breaks again > > ( $DEBUG ) && print "Getting $url\n"; > $page = $self->get_page( $url ); > > #raise an error if its private > #if($self->is_private(page => $page)) { > #$self->error("cannot get comments from private profile"); > #return undef; > #} > > # find out how many comments in total > if ($page->decoded_content =~ /.*Listing [\d-]+ of (\d+).*/smo){ > $commentcount=$1; > } else { > $self->error("Could not find how many comments are on > profile"); > return undef; > } > > for (my $i=1;$i<=50;$i++) { > $page=$self->{current_page}; > > > push @comments, $self->_get_comments_from_page( > $page->decoded_content ); > > #make sure we did not get an error > return undef if ($self->error); > > last unless ( $self->_next_button( $page->decoded_content ) ); > > # WZ 11162007 - Comment out, __EVENTVALIDATION does not exist in > the page. > #get value of form field eventvalidation > # if ($page->decoded_content =~ /id=\"__EVENTVALIDATION\" > value=\"(.*?)\"/o){ > # $eventvalidation=$1; > # } > # else { > # $self->error("get_comments could not determine > eventvalidation in form"); > # return undef; > # } > # #get value of form field viewstate > if ($page->decoded_content =~ /id=\"__VIEWSTATE\" value= > \"(.*?)\"/o){ > $viewstate=$1; > } > else{ > $self->error("get_comments could not determine viewstate > in form"); > > return undef; > } > > #create a form using these values > my $htmlform=qq{<form name="aspnetForm" method="post" > action="/index.cfm?fuseaction=user.viewComments&amp;friendID= > $friend_id" > id="aspnetForm">}. > qq{<input type="hidden" name="__EVENTTARGET" > id="__EVENTTARGET" value="$eventtarget" />}. > qq{<input type="hidden" name="__EVENTARGUMENT" > id="__EVENTARGUMENT" value="$i" />}. #WZ 11162007 - Changed to include > $i as the value for __EVENTARGUMENT > qq{<input type="hidden" name="__VIEWSTATE" > id="__VIEWSTATE" value="$viewstate" />}. > #qq{<input type="hidden" name="__EVENTVALIDATION" > id="__EVENTVALIDATION" value="$eventvalidation" />}. > qq{</form>}; > my $form=HTML::Form->parse($htmlform,"http://comment.myspace.com/index.cfm > "); > > ( $DEBUG ) && print "try to submit form to access comments > page #",$i,"\n"; > > #submit it and hope for the best > $self->submit_form({form => $form,no_click=> 1,follow=>0}); > > #submit the form to get to next page > #$self->submit_form({ > # follow => 0, > # form_name => "aspnetForm", > # no_click => 1, > # fields_ref => { __EVENTTARGET => $eventtarget, > __EVENTARGUMENT => '' } > # #re1 => 'something unique.?about this[ \t\n]+page', > # }); > > # sleep ( int( rand( 2 ) ) + 1 ); > } > > # unless(scalar (@comments) == $commentcount){ > # $self->error("Could not collect all comments. Have " . > @comments .", should have $commentcount"); > # return undef; > # } > > return \@comments; > } > > On Nov 16, 2007 10:13 AM, via RT <bug-WWW-Myspace@rt.cpan.org> wrote:
>> >> <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > >> >> Hi William, >> >> Thanks for the patch. Does get_comments work for you? The same >> sort of processing is >> done there. Unfortunately, this patch is subject to break if >> myspace makes a minute change >> in the way they format lines (which they do on a regular basis). >> I'd like to figure out why the >> regex is hanging certain systems. >> >> Thanks, >> >> Grant >> >> >> On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
>>> Okay... this is not as eloquent as your solution, but it works for >>> me.... Here's the new _get_messages_from_page. I'm assuming it will >>> work on yours to. Would love for you to try it out and let me know. >>> >>> -William >>> >>> # Updated by WZorn to fix hanging problem on Mandriva and RetHat >>> linux. >>> sub _get_messages_from_page { >>> >>> my ( $dummy, %options ) = @_; >>> my $page = $self->current_page->decoded_content; >>> my @messages = (); >>> my $state = 0; # State Values >>> # 0 - Beginning state, >>> looking for beginning of message block >>> # 1 - In message block, >>> looking for data >>> # Will return to state=0 >>> when we get the last data (messageID >>> and subject) >>> my $sender; >>> my $status; >>> my $msg_id; >>> my $subject; >>> >>> open(my $fh, "<", \$page); >>> while ( <$fh> ) { >>> chomp; >>> last if ( $options{'stop_at'} && ( $options{'stop_at'} == >>> $3 ) ); >>> if(/<td class="messageListCell" align="center"> >>> /){ >>> # Found beginning of Message block >>> $state = 1; >>> } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){ >>> $sender = $1; >>> } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){ >>> $status = $1; >>> } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){ >>> $msg_id = $1; >>> $subject = $2; >>> $state = 0; #return to state=0 because we need to >>> start looking >>> for the beginning of the next message block >>> >>> push @messages, { sender => $sender, status => >>> $status, message_id >>> => $msg_id, subject => $subject }; >>> if ($DEBUG) { print $sender,"|",$status,"|", >>> $msg_id,"|",$subject,"\n"; } >>> } >>> } >>> return @messages; >>> } >>> >>> On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT >>> <bug-WWW-Myspace@rt.cpan.org> wrote:
>>>> >>>> <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 > >>>> >>>> Please upgrade to the current version of WWW::Myspace. >>>> >>>> The loop will exit when all the messages have been parsed - the >>>> $page >>>> variable is modified through each loop. If the RE being matched >>>> against $page doesn't match (i.e. due to a myspace change), the >>>> loop >>>> will exit anyway. Thus, this most likely isn't the cause of the >>>> problem. >>>> >>>> There is a known issue, however, that could cause an endless loop >>>> in >>>> get_inbox if the paging isn't working properly (i.e. if it tries >>>> to go >>>> to the next page, but gets the same one instead). The method works >>>> for me however (on MacOS 10.5). >>>> >>>> On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote: >>>>
>>>>> >>>>> Thu Nov 15 16:41:00 2007: Request 30762 was acted upon. >>>>> Transaction: Ticket created by eewill40z@gmail.com >>>>> Queue: WWW-Myspace >>>>> Subject: Get_inbox hang, ref bug 30351 >>>>> Broken in: (no value) >>>>> Severity: (no value) >>>>> Owner: Nobody >>>>> Requestors: eewill40z@gmail.com >>>>> Status: new >>>>> Ticket <URL: http://rt.cpan.org/Ticket/Display.html?id=30762 >
>>>>
>>>>> >>>>> >>>>> I'm not sure what the fix is, but this module hangs because the >>>>> 'last if...' statement around line 3581 (in sub >>>>> _get_messages_from_page) will never be evaluated as true. The hash >>>>> key 'stop_at' does not appear to exist in the %options hash. Same >>>>> goes with the 'end_page' key into the %options hash located in the >>>>> sub get_inbox. >>>>> >>>>>
>>>> >>>> >>>>
>> >> >> >> >>
> >
Subject: Re: [rt.cpan.org #30762] Get_inbox hang, ref bug 30351
Date: Sat, 17 Nov 2007 15:48:22 +0000
To: bug-WWW-Myspace [...] rt.cpan.org
From: Steven Chamberlain <steven [...] pyro.eu.org>
Hi, The latest SVN of WWW::Myspace with William's fix does indeed solve the problem I reported (bug #30351). I'm not sure exactly what the problem was with the old code, but the new inbox parsing looks like a more efficient algorithm anyway. Good work! Thank you, -- Steven Chamberlain steven@pyro.eu.org
Old ticket, resolved in November. Closing.