Skip Menu |

This queue is for tickets about the IMDB-Film CPAN distribution.

Report information
The Basics
Id: 61971
Status: open
Priority: 0/
Queue: IMDB-Film

People
Owner: Nobody in particular
Requestors: g_ml2000-x [...] yahoo.de
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in:
  • 0.42
  • 0.43
  • 0.44
  • 0.45
  • 0.46
Fixed in: (no value)



Subject: Failure parsing TV episodes
(I am not sure, but I think this is due to a change in IMDB and used to work before.) IMDB::Film consistently fails to retrieve TV episodes. It always crashes in sub title when trying to parse the title ("Use of uninitialized value in pattern match (m//) at /usr/local/lib/site_perl/IMDB/Film.pm line 376.") An example: On http://www.imdb.com/title/tt0517677/ the title returned by IMDB is "Babylon 5" Passing Through Gethsemane (TV episode 1995) - IMDb which obviously is not what the Regex in Line 371 expects. (Line Numbers are referring to IMDB::Film 0.46)
From: g_ml2000-x [...] yahoo.de
... actually, it looks like recent layout changes on imdb.com broke many other things in IMDB::Film :-(
From: 2ge [...] 2ge.us
On Sat Oct 09 05:46:29 2010, g_ml200 wrote: Show quoted text
> ... actually, it looks like recent layout changes on imdb.com broke many > other things in IMDB::Film :-( >
Hi, I got also problem with parsing TV Shows, name of show is broken for example. Please test parser on: http://www.imdb.com/title/tt0460649/ http://www.imdb.com/title/tt1592154/ IMDB should give us API anyway :(
Subject: Re: [rt.cpan.org #61971] Failure parsing TV episodes
Date: Sat, 9 Oct 2010 16:45:22 +0300
To: bug-IMDB-Film [...] rt.cpan.org
From: Michael Stepanov <stepanov.michael [...] gmail.com>
Hi, Thanks for bugreport. Will investigate and fix it. Sure API can solve all such problems. But by some reason IMDB doesn't want to do that. On Sat, Oct 9, 2010 at 1:51 PM, Ing. Branislav Gerzo via RT < bug-IMDB-Film@rt.cpan.org> wrote: Show quoted text
> Queue: IMDB-Film > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61971 > > > On Sat Oct 09 05:46:29 2010, g_ml200 wrote:
> > ... actually, it looks like recent layout changes on imdb.com broke many > > other things in IMDB::Film :-( > >
> > Hi, > > I got also problem with parsing TV Shows, name of show is broken for > example. Please test parser on: > > http://www.imdb.com/title/tt0460649/ > http://www.imdb.com/title/tt1592154/ > > IMDB should give us API anyway :( > > > >
-- Best regards, Michael Stepanov http://linuxmce.ru
The following patch fixes things for me. Seems they are putting kind inside the year listings.
Subject: imdb-new-tv.patch
diff -ruN IMDB-Film-0.46.old/lib/IMDB/Film.pm IMDB-Film-0.46/lib/IMDB/Film.pm --- IMDB-Film-0.46.old/lib/IMDB/Film.pm 2010-09-08 08:34:15.000000000 -0600 +++ IMDB-Film-0.46/lib/IMDB/Film.pm 2010-10-12 12:24:36.000000000 -0600 @@ -369,6 +369,12 @@ $self->_show_message("title: $title", 'DEBUG'); ($self->{_title}, $self->{_year}, $self->{_kind}) = $title =~ m!(.*?)\s+\(([\d\?]{4}).*?\)(?:\s+\((.*?)\))?!; + unless ($self->{_title}) { + ($self->{_title}, $self->{_kind}, $self->{_year}) = $title =~ m!(.*?)\s+\((.*?)\s+([\d\?]{4}).*?\)?!; + while ( my ($k, $v) = each %FILM_KIND ) { + $self->{_kind} = $k if $self->{_kind} =~ m{$v}i; + } + } $self->{_kind} = '' unless $self->{_kind}; # "The Series" An Episode (2005)
Subject: Re: [rt.cpan.org #61971] Failure parsing TV episodes
Date: Wed, 13 Oct 2010 09:54:36 +0300
To: bug-IMDB-Film [...] rt.cpan.org
From: Michael Stepanov <stepanov.michael [...] gmail.com>
Thanks a lot for the patch. IMDB recently changed design for the site. So, I'm fixing the module functionality now and your patch will help to release the module sooner :) On Tue, Oct 12, 2010 at 9:31 PM, http://slords.pip.verisignlabs.com/ via RT <bug-IMDB-Film@rt.cpan.org> wrote: Show quoted text
> Queue: IMDB-Film > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61971 > > > The following patch fixes things for me. Seems they are putting kind > inside the year listings. >
-- Best regards, Michael Stepanov http://linuxmce.ru
Subject: Re: [rt.cpan.org #61971] Failure parsing TV episodes
Date: Wed, 13 Oct 2010 11:48:02 +0300
To: bug-IMDB-Film [...] rt.cpan.org
From: Michael Stepanov <stepanov.michael [...] gmail.com>
Hi, The new version of IMDB::Film 0.47 is uploaded to CPAN. It'll be available after 2-3 hours, I guess. Please, try it and send me all found bugs. On Tue, Oct 12, 2010 at 9:31 PM, http://slords.pip.verisignlabs.com/ via RT <bug-IMDB-Film@rt.cpan.org> wrote: Show quoted text
> Queue: IMDB-Film > Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=61971 > > > The following patch fixes things for me. Seems they are putting kind > inside the year listings. >
-- Best regards, Michael Stepanov http://linuxmce.ru
From: g_ml2000-x [...] yahoo.de
Am Mi 13. Okt 2010, 04:48:31, stepanov.michael@gmail.com schrieb: Show quoted text
> The new version of IMDB::Film 0.47 is uploaded to CPAN.
That was quick :-) Thanks a lot! All the problems that I noticed are fixed. One little side notice: It seems like one of the recent changes on imdb.com is that they now always deliver pages that are partially localized. If there is a German title available for some movie, I now always get the German title even on "imdb.com" (before this was only the case on "imdb.de"). This country recognition is not browser-dependent but seemingly just based on the originating IP address, so for exmaple "make test" for IMDB::Film now shows 2 failures because "Troja" instead of "Troy" is retrieved as a title. Possible, this also causes confusion on other occasions ...
On Wed Oct 13 02:55:05 2010, stepanov.michael@gmail.com wrote: Show quoted text
> Thanks a lot for the patch. IMDB recently changed design for the site. So, > I'm fixing the module functionality now and your patch will help to
release Show quoted text
> the module sooner :) >
The fix you put in works to a point but doesn't return consistent types for tv episodes. Before tv series would return 'S' for kind and now they return 'TV Series'. The part of the patch you dropped tried to convert this back if it could. I've attached a new patch that puts that functionality back in as well as cleaning up the until/if blocks a little more to do better error checking.
Subject: IMDB-Film-0.47-imdb-kind.patch
diff -ruN IMDB-Film-0.47.old/lib/IMDB/Film.pm IMDB-Film-0.47/lib/IMDB/Film.pm --- IMDB-Film-0.47.old/lib/IMDB/Film.pm 2010-10-13 02:39:16.000000000 -0600 +++ IMDB-Film-0.47/lib/IMDB/Film.pm 2010-10-13 08:28:19.000000000 -0600 @@ -372,9 +372,12 @@ $self->_show_message("title: $title", 'DEBUG'); # TODO: implement parsing of TV series like ALF (TV Series 1986–1990) - ($self->{_title}, $self->{_year}, $self->{_kind}) = $title =~ m!(.*?)\s+\((\d{4})\)(?:\s\((\w*)\))?!; - unless($self->{_title}) { - ($self->{_title}, $self->{_kind}, $self->{_year}) = $title =~ m!(.*?)\s+\((.*?)?\s?([0-9\-]*)\)!; + unless( ($self->{_title}, $self->{_year}, $self->{_kind}) = $title =~ m!(.*?)\s+\((\d{4})\)(?:\s\((\w*)\))?! ) { + if( ($self->{_title}, $self->{_kind}, $self->{_year}) = $title =~ m!(.*?)\s+\((.*?)?\s?([0-9\-]*)\)! ) { + while( my ($k, $v) = each %FILM_KIND ) { + $self->{_kind} = $k if $self->{_kind} =~ m{$v}i; + } + } } $self->{_kind} = '' unless $self->{_kind};
On Wed Oct 13 10:32:52 2010, http://slords.pip.verisignlabs.com/ wrote: Show quoted text
> The fix you put in works to a point but doesn't return consistent types > for tv episodes. Before tv series would return 'S' for kind and now > they return 'TV Series'. The part of the patch you dropped tried to > convert this back if it could.
I noticed this when I tried to pull episodes for a tv series and it failed.