>
>
> Hi William,
>
> I mentioned this in the duplicate bug report also - try updating to
> the most recent WWW::Mechanize and HTML::Parser (if you're not already
> running them). Steve (bug#30351) thinks the regex parser might be
> hanging on malformed UTF-8 characters (which is probably what you're
> seeing with the "extremely long character" error. I want to see if
> HTML::Parser and/or WWW::Mechanize may have updated something that
> fixes that.
>
> On Nov 16, 2007, at 8:38 AM, William Zorn via RT wrote:
>
> >
> > Queue: WWW-Myspace
> > Ticket <URL:
http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >
> > I'm moving on to that one next.
> >
> > I think it is breaking because of some strange character that shows
> > up in
> > the string. I've seen something like this happen when you try to
> > use a DOS
> > file on UNIX with perl. I also get an "Extremely long character"
> > message when I'm sending the output to a file via the ">" on the
> > command
> > line. Can't tell you what it's complaining about, but my guess is
> > that
> > might be the problem. I just don't know how to fix it. Maybe, if
> > you have
> > some UNIX development contacts you could get them to look at it. I'm
> > basically an amature, but I love to mess around with perl.
> >
> > -William
> >
> >
> > On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote:
> >>
> >>
> >> <URL:
http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >>
> >> Hi William,
> >>
> >> Thanks for the patch. Does get_comments work for you? The same
> >> sort of
> >> processing is
> >> done there. Unfortunately, this patch is subject to break if myspace
> >> makes a minute change
> >> in the way they format lines (which they do on a regular basis).
> >> I'd like
> >> to figure out why the
> >> regex is hanging certain systems.
> >>
> >> Thanks,
> >>
> >> Grant
> >>
> >> On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> >>> Okay... this is not as eloquent as your solution, but it works for
> >>> me.... Here's the new _get_messages_from_page. I'm assuming it will
> >>> work on yours to. Would love for you to try it out and let me know.
> >>>
> >>> -William
> >>>
> >>> # Updated by WZorn to fix hanging problem on Mandriva and RetHat
> >>> linux.
> >>> sub _get_messages_from_page {
> >>>
> >>> my ( $dummy, %options ) = @_;
> >>> my $page = $self->current_page->decoded_content;
> >>> my @messages = ();
> >>> my $state = 0; # State Values
> >>> # 0 - Beginning state,
> >>> looking for
> >> beginning of message block
> >>> # 1 - In message block, looking
> >> for data
> >>> # Will return to state=0
> >>> when
> >> we get the last data (messageID
> >>> and subject)
> >>> my $sender;
> >>> my $status;
> >>> my $msg_id;
> >>> my $subject;
> >>>
> >>> open(my $fh, "<", \$page);
> >>> while ( <$fh> ) {
> >>> chomp;
> >>> last if ( $options{'stop_at'} && ( $options{'stop_at'} ==
> >>> $3 )
> >> );
> >>> if(/<td class="messageListCell" align="center">
> >>> /){
> >>> # Found beginning of Message block
> >>> $state = 1;
> >>> } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){
> >>> $sender = $1;
> >>> } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){
> >>> $status = $1;
> >>> } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){
> >>> $msg_id = $1;
> >>> $subject = $2;
> >>> $state = 0; #return to state=0 because we need to
> >>> start
> >> looking
> >>> for the beginning of the next message block
> >>>
> >>> push @messages, { sender => $sender, status => $status,
> >> message_id
> >>> => $msg_id, subject => $subject };
> >>> if ($DEBUG) { print
> >> $sender,"|",$status,"|",$msg_id,"|",$subject,"\n"; }
> >>> }
> >>> }
> >>> return @messages;
> >>> }
> >>>
> >>> On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT
> >>> <bug-WWW-Myspace@rt.cpan.org> wrote:
> >>>>
> >>>> <URL:
http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >>>>
> >>>> Please upgrade to the current version of WWW::Myspace.
> >>>>
> >>>> The loop will exit when all the messages have been parsed - the
> >>>> $page
> >>>> variable is modified through each loop. If the RE being matched
> >>>> against $page doesn't match (i.e. due to a myspace change), the
> >>>> loop
> >>>> will exit anyway. Thus, this most likely isn't the cause of the
> >>>> problem.
> >>>>
> >>>> There is a known issue, however, that could cause an endless loop
> >>>> in
> >>>> get_inbox if the paging isn't working properly (i.e. if it tries
> >>>> to go
> >>>> to the next page, but gets the same one instead). The method works
> >>>> for me however (on MacOS 10.5).
> >>>>
> >>>> On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote:
> >>>>
> >>>>>
> >>>>> Thu Nov 15 16:41:00 2007: Request 30762 was acted upon.
> >>>>> Transaction: Ticket created by eewill40z@gmail.com
> >>>>> Queue: WWW-Myspace
> >>>>> Subject: Get_inbox hang, ref bug 30351
> >>>>> Broken in: (no value)
> >>>>> Severity: (no value)
> >>>>> Owner: Nobody
> >>>>> Requestors: eewill40z@gmail.com
> >>>>> Status: new
> >>>>> Ticket <URL:
http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >>>>
> >>>>>
> >>>>>
> >>>>> I'm not sure what the fix is, but this module hangs because the
> >>>>> 'last if...' statement around line 3581 (in sub
> >>>>> _get_messages_from_page) will never be evaluated as true. The hash
> >>>>> key 'stop_at' does not appear to exist in the %options hash. Same
> >>>>> goes with the 'end_page' key into the %options hash located in the
> >>>>> sub get_inbox.
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >>
> >>
> >>
> >
> > I'm moving on to that one next.
> >
> > I think it is breaking because of some strange character that shows
> > up in the string. I've seen something like this happen when you try
> > to use a DOS file on UNIX with perl. I also get an "Extremely long
> > character" message when I'm sending the output to a file via the ">"
> > on the command line. Can't tell you what it's complaining about,
> > but my guess is that might be the problem. I just don't know how to
> > fix it. Maybe, if you have some UNIX development contacts you could
> > get them to look at it. I'm basically an amature, but I love to
> > mess around with perl.
> >
> > -William
> >
> >
> > On 11/16/07, via RT <bug-WWW-Myspace@rt.cpan.org> wrote:
> > <URL:
http://rt.cpan.org/Ticket/Display.html?id=30762 >
> >
> > Hi William,
> >
> > Thanks for the patch. Does get_comments work for you? The same
> > sort of processing is
> > done there. Unfortunately, this patch is subject to break if
> > myspace makes a minute change
> > in the way they format lines (which they do on a regular basis).
> > I'd like to figure out why the
> > regex is hanging certain systems.
> >
> > Thanks,
> >
> > Grant
> >
> > On Fri Nov 16 10:59:52 2007, eewill40z@gmail.com wrote:
> > > Okay... this is not as eloquent as your solution, but it works for
> > > me.... Here's the new _get_messages_from_page. I'm assuming it will
> > > work on yours to. Would love for you to try it out and let me know.
> > >
> > > -William
> > >
> > > # Updated by WZorn to fix hanging problem on Mandriva and RetHat
> > linux.
> > > sub _get_messages_from_page {
> > >
> > > my ( $dummy, %options ) = @_;
> > > my $page = $self->current_page->decoded_content;
> > > my @messages = ();
> > > my $state = 0; # State Values
> > > # 0 - Beginning state,
> > looking for beginning of message block
> > > # 1 - In message block,
> > looking for data
> > > # Will return to state=0
> > when we get the last data (messageID
> > > and subject)
> > > my $sender;
> > > my $status;
> > > my $msg_id;
> > > my $subject;
> > >
> > > open(my $fh, "<", \$page);
> > > while ( <$fh> ) {
> > > chomp;
> > > last if ( $options{'stop_at'} && ( $options{'stop_at'} ==
> > $3 ) );
> > > if(/<td class="messageListCell" align="center">
> > > /){
> > > # Found beginning of Message block
> > > $state = 1;
> > > } elsif (/viewprofile&friendid=([0-9]+)/ && $state == 1){
> > > $sender = $1;
> > > } elsif (/(Unread|Read|Sent|Replied)/ && $state == 1){
> > > $status = $1;
> > > } elsif (/messageID=([^&]+)&.*?>([^<]+)</ && $state == 1){
> > > $msg_id = $1;
> > > $subject = $2;
> > > $state = 0; #return to state=0 because we need to
> > start looking
> > > for the beginning of the next message block
> > >
> > > push @messages, { sender => $sender, status =>
> > $status, message_id
> > > => $msg_id, subject => $subject };
> > > if ($DEBUG) { print $sender,"|",$status,"|",
> > $msg_id,"|",$subject,"\n"; }
> > > }
> > > }
> > > return @messages;
> > > }
> > >
> > > On Nov 15, 2007 4:28 PM, grantg@spamarrest.com via RT
> > > < bug-WWW-Myspace@rt.cpan.org> wrote:
> > $page
> > > > variable is modified through each loop. If the RE being matched
> > > > against $page doesn't match ( i.e. due to a myspace change), the
> > loop
> > > > will exit anyway. Thus, this most likely isn't the cause of the
> > > > problem.
> > > >
> > > > There is a known issue, however, that could cause an endless
> > loop in
> > > > get_inbox if the paging isn't working properly (i.e. if it tries
> > to go
> > > > to the next page, but gets the same one instead). The method
> > works
> > > > for me however (on MacOS 10.5).
> > > >
> > > > On Nov 15, 2007, at 1:41 PM, William Zorn via RT wrote:
> > > >
> > > > >
> > > > > Thu Nov 15 16:41:00 2007: Request 30762 was acted upon.
> > > > > Transaction: Ticket created by eewill40z@gmail.com
> > > > > Queue: WWW-Myspace
> > > > > Subject: Get_inbox hang, ref bug 30351
> > > > > Broken in: (no value)
> > > > > Severity: (no value)
> > > > > Owner: Nobody
> > > > > Requestors: eewill40z@gmail.com
> > > > > Status: new
> > > > > Ticket <URL:
http://rt.cpan.org/Ticket/Display.html?id=30762 >
> > > >
> > > > >
> > > > >
> > > > > I'm not sure what the fix is, but this module hangs because the
> > > > > 'last if...' statement around line 3581 (in sub
> > > > > _get_messages_from_page) will never be evaluated as true. The
> > hash
> > > > > key 'stop_at' does not appear to exist in the %options hash.
> > Same
> > > > > goes with the 'end_page' key into the %options hash located in
> > the
> > > > > sub get_inbox.
> > > > >
> > > > >
> > > >
> > > >
> > > >
> >
> >
> >
> >
> >