Subject: | save_state and restore_state with an AI::Categorizer::Learner::NaiveBayes object (and maybe others) |
summary:
restoring a stored AI::Categorizer::Learner::NaiveBayes object and using
it to categorize a document results in an error.
error message:
Can't locate object method "predict" via package
"Algorithm::NaiveBayes::Model::Frequency" at
/usr/local/share/perl/5.8.8/AI/Categorizer/Learner/NaiveBayes.pm line 28.
it is caused by the second line:
$doc = AI::Categorizer::Document->new( content => "some text" );
my $hypothesis = $learner->categorize( $doc );
this is reproduceable in
AI-Categorizer-0.09
and
AI-Categorizer-0.07
(not tested with AI-Categorizer-0.08 or earlier versions)
I've included a modified version of t.he demo.pl script to give you an
easy way to reproduce this behaviour.
step one:
run the script once (note: it stores the state in the corpus directory)
now a trained learner is stored.
step two:
comment out the line marked as 'store_line'.
now.. we are training a learner, but we never use it.
we use the stored one from step one.
run it again and everything works fine.
step three:
comment out the line marked as 'train_line'
now, we just create a new learner but we never train nor use it.
again we use the stored learner from step one.
I should work, since we have stored the learner, so we do not have to
train it everytime the script is invoked.
but it does not.
at the line:
my $experiment = $l2->categorize_collection( collection => $test );
it fails with the error:
Can't locate object method "labels" via package
"Algorithm::NaiveBayes::Model::Frequency" at
/usr/local/share/perl/5.8.8/AI/Categorizer/Learner/NaiveBayes.pm line 46.
if the experiment is commented out, it fails with:
Can't locate object method "predict" via package
"Algorithm::NaiveBayes::Model::Frequency" at
/usr/local/share/perl/5.8.8/AI/Categorizer/Learner/NaiveBayes.pm line 28.
at this line:
my $h = $l2->categorize( $doc );
as far as I know it should work this way.
but maybe I've missed something.
research I've done so far:
Algorithm::NaiveBayes::Model::Frequency does not have a 'predict'
methode, but it should inherit it from Algorithm::NaiveBayes.
somehow this does not seem to be the case with restored states.
interesting:
I've copied the predict methode from Algorithm::NaiveBayes to
Algorithm::NaiveBayes::Model::Frequency to see if this works.
but it does yield the same error.
at this point I gave up.
my enviroment:
Debian Etch Linux 2.6.19.1 #4 Fri Dec 15 15:32:18 CET 2006 i686 GNU/Linux
perl, v5.8.8 built for i486-linux-gnu-thread-multi
hardware: Via C3 Eden CPU (more or less a i686)
Subject: | bug_demo.pl |
#!/usr/bin/perl
# This script is a fairly simple demonstration of how AI::Categorizer
# can be used. There are lots of other less-simple demonstrations
# (actually, they're doing much simpler things, but are probably
# harder to follow) in the tests in the t/ subdirectory. The
# eg/categorizer script can also be a good example if you're willing
# to figure out a bit how it works.
#
# This script reads a training corpus from a directory of plain-text
# documents, trains a Naive Bayes categorizer on it, then tests the
# categorizer on a set of test documents.
use strict;
use AI::Categorizer;
use AI::Categorizer::Collection::Files;
use AI::Categorizer::Learner::NaiveBayes;
use File::Spec;
die("Usage: $0 <corpus>\n".
" A sample corpus (data set) can be downloaded from\n".
" http://www.cpan.org/authors/Ken_Williams/data/reuters-21578.tar.gz\n".
" or http://www.limnus.com/~ken/reuters-21578.tar.gz\n")
unless @ARGV == 1;
my $corpus = shift;
my $training = File::Spec->catfile( $corpus, 'training' );
my $test = File::Spec->catfile( $corpus, 'test' );
my $cats = File::Spec->catfile( $corpus, 'cats.txt' );
my $stopwords = File::Spec->catfile( $corpus, 'stopwords' );
my $learner_state = File::Spec->catfile( $corpus, 'learner_state' );
my %params;
if (-e $stopwords) {
$params{stopword_file} = $stopwords;
} else {
warn "$stopwords not found - no stopwords will be used.\n";
}
if (-e $cats) {
$params{category_file} = $cats;
} else {
die "$cats not found - can't proceed without category information.\n";
}
# In a real-world application these Collection objects could be of any
# type (any Collection subclass). Or you could create each Document
# object manually. Or you could let the KnowledgeSet create the
# Collection objects for you.
$training = AI::Categorizer::Collection::Files->new( path => $training, %params );
$test = AI::Categorizer::Collection::Files->new( path => $test, %params );
# We turn on verbose mode so you can watch the progress of loading &
# training. This looks nicer if you have Time::Progress installed!
print "Loading training set\n";
my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1 );
$k->load( collection => $training );
print "Training categorizer\n";
my $l = AI::Categorizer::Learner::NaiveBayes->new( verbose => 1 );
$l->train( knowledge_set => $k ); ### train_line
$l->save_state($learner_state); ### store_line
my $l2 = AI::Categorizer::Learner::NaiveBayes->restore_state($learner_state);
print "Categorizing test set\n";
my $experiment = $l2->categorize_collection( collection => $test );
print $experiment->stats_table;
# If you want to get at the specific assigned categories for a
# specific document, you can do it like this:
my $doc = AI::Categorizer::Document->new
( content => "Hello, I am a pretty generic document with not much to say." );
my $h = $l2->categorize( $doc );
print ("For test document:\n",
" Best category = ", $h->best_category, "\n",
" All categories = ", join(', ', $h->categories), "\n");