Subject: | IMDB search result format changed? Here is a patch... |
Date: | Sun, 08 Jul 2007 23:48:29 +0200 |
To: | bug-IMDB-Film [...] rt.cpan.org |
From: | Peter Valdemar Mørch <peter [...] morch.com> |
Hi,
It looks like the IMDB output for search results has changed when
looking at the matched() array.
* Some (bogus) entries contained "[IMG]" strings
* The same ID is mentioned several times
* Some titles contained a trailing number-dot. e.g. "2."
* The status after searching for e.g. "The Sight"
was 0 (or *Error*), even though it can be found.
Here is a reproducing script, then the output from it, and last the
patch, that creates my version from the original 0.28.
The script should reproduce it for you too, I'm sure.
make test passed, but I didn't write a test case for it.
Sincerely,
Peter
******************************************************
# use lib glob("IMDB-Film-0.28/lib");
use lib glob("IMDB-Film-0.28.orig/lib");
use IMDB::Film;
foreach ('Alien', 'The Sight') {
my $imdbObj = new IMDB::Film(crit => $_);
printf "%s\nStatus for %s: %s\n", '#'x80, $_, $imdbObj->status;
my $i=0;
foreach my $m (@{$imdbObj->matched()}) {
print "$$m{id} $$m{title}\n";
last if ($i++ > 3);
}
}
******************************************************
Here is the output of the script - first with the original library and
then with my patched one:
base@peter:~/perl> perl imdb.pl
################################################################################
Status for Alien: 2
0078748 Alien (1979)
0078748 [IMG] [IMG] 1. [IMG]
0078748 Alien (1979) aka "Alien: The Director's Cut" - USA (director's cut)
0090605 [IMG] [IMG] 2. [IMG]
0090605 Aliens (1986) aka "Alien 2" - USA (working title) aka "Alien
II" - USA (working title)
################################################################################
Status for The Sight: 0
0262001 [IMG] [IMG] 1. [IMG]
0262001 The Sight (2000) (TV) 2.
0227498 The Sight (1985) 3.
0211625 The Sight (1998) Titles (Partial Matches) (Displaying 9 Results) 1.
0076495 Olsen-banden deruda' (1977) aka "The Olsen Gang Outta Sight" -
(English title)
base@peter:~/perl> perl imdb.pl
################################################################################
Status for Alien: 2
0078748 Alien (1979)
0078748 Alien (1979) aka "Alien: The Director's Cut" - USA (director's cut)
0090605 Aliens (1986) aka "Alien 2" - USA (working title) aka "Alien
II" - USA (working title)
0093773 Predator (1987) aka "Alien Hunter" - USA (working title)
0103644 Alien³ (1992) aka "Alien 3" - Germany, USA (alternative
spelling), Spain, France, Portugal, Turkey (Turkish title) (theatrical
title)
################################################################################
Status for The Sight: 2
0262001 The Sight (2000) (TV)
0227498 The Sight (1985)
0211625 The Sight (1998)
0076495 Olsen-banden deruda' (1977) aka "The Olsen Gang Outta Sight" -
(English title)
0000350 The Countryman and the Cinematograph (1901) aka "The
Countryman's First Sight of the Animated Pictures" - USA
******************************************************
And here is a patch that fixes it:
diff -wur IMDB-Film-0.28.orig/lib/IMDB/Film.pm
IMDB-Film-0.28/lib/IMDB/Film.pm
--- IMDB-Film-0.28.orig/lib/IMDB/Film.pm 2007-05-07
10:14:43.000000000 +0200
+++ IMDB-Film-0.28/lib/IMDB/Film.pm 2007-07-08 17:52:10.000000000 +0200
@@ -273,7 +273,7 @@
sub _search_film {
my CLASS_NAME $self = shift;
- return $self->SUPER::_search_results('\/title\/tt(\d+)');
+ return $self->SUPER::_search_results('^\/title\/tt(\d+)', '/td');
}
=item _get_simple_prop()
--
Peter Valdemar Mørch
http://www.morch.com