Skip Menu |

This queue is for tickets about the AI-Categorizer CPAN distribution.

Report information
The Basics
Id: 25834
Status: open
Priority: 0/
Queue: AI-Categorizer

People
Owner: Nobody in particular
Requestors: quattro [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in:
  • 0.07
  • 0.09
Fixed in: (no value)



Subject: save_state and restore_state with an AI::Categorizer::Learner::NaiveBayes object (and maybe others)
summary: restoring a stored AI::Categorizer::Learner::NaiveBayes object and using it to categorize a document results in an error. error message: Can't locate object method "predict" via package "Algorithm::NaiveBayes::Model::Frequency" at /usr/local/share/perl/5.8.8/AI/Categorizer/Learner/NaiveBayes.pm line 28. it is caused by the second line: $doc = AI::Categorizer::Document->new( content => "some text" ); my $hypothesis = $learner->categorize( $doc ); this is reproduceable in AI-Categorizer-0.09 and AI-Categorizer-0.07 (not tested with AI-Categorizer-0.08 or earlier versions) I've included a modified version of t.he demo.pl script to give you an easy way to reproduce this behaviour. step one: run the script once (note: it stores the state in the corpus directory) now a trained learner is stored. step two: comment out the line marked as 'store_line'. now.. we are training a learner, but we never use it. we use the stored one from step one. run it again and everything works fine. step three: comment out the line marked as 'train_line' now, we just create a new learner but we never train nor use it. again we use the stored learner from step one. I should work, since we have stored the learner, so we do not have to train it everytime the script is invoked. but it does not. at the line: my $experiment = $l2->categorize_collection( collection => $test ); it fails with the error: Can't locate object method "labels" via package "Algorithm::NaiveBayes::Model::Frequency" at /usr/local/share/perl/5.8.8/AI/Categorizer/Learner/NaiveBayes.pm line 46. if the experiment is commented out, it fails with: Can't locate object method "predict" via package "Algorithm::NaiveBayes::Model::Frequency" at /usr/local/share/perl/5.8.8/AI/Categorizer/Learner/NaiveBayes.pm line 28. at this line: my $h = $l2->categorize( $doc ); as far as I know it should work this way. but maybe I've missed something. research I've done so far: Algorithm::NaiveBayes::Model::Frequency does not have a 'predict' methode, but it should inherit it from Algorithm::NaiveBayes. somehow this does not seem to be the case with restored states. interesting: I've copied the predict methode from Algorithm::NaiveBayes to Algorithm::NaiveBayes::Model::Frequency to see if this works. but it does yield the same error. at this point I gave up. my enviroment: Debian Etch Linux 2.6.19.1 #4 Fri Dec 15 15:32:18 CET 2006 i686 GNU/Linux perl, v5.8.8 built for i486-linux-gnu-thread-multi hardware: Via C3 Eden CPU (more or less a i686)
Subject: bug_demo.pl
#!/usr/bin/perl # This script is a fairly simple demonstration of how AI::Categorizer # can be used. There are lots of other less-simple demonstrations # (actually, they're doing much simpler things, but are probably # harder to follow) in the tests in the t/ subdirectory. The # eg/categorizer script can also be a good example if you're willing # to figure out a bit how it works. # # This script reads a training corpus from a directory of plain-text # documents, trains a Naive Bayes categorizer on it, then tests the # categorizer on a set of test documents. use strict; use AI::Categorizer; use AI::Categorizer::Collection::Files; use AI::Categorizer::Learner::NaiveBayes; use File::Spec; die("Usage: $0 <corpus>\n". " A sample corpus (data set) can be downloaded from\n". " http://www.cpan.org/authors/Ken_Williams/data/reuters-21578.tar.gz\n". " or http://www.limnus.com/~ken/reuters-21578.tar.gz\n") unless @ARGV == 1; my $corpus = shift; my $training = File::Spec->catfile( $corpus, 'training' ); my $test = File::Spec->catfile( $corpus, 'test' ); my $cats = File::Spec->catfile( $corpus, 'cats.txt' ); my $stopwords = File::Spec->catfile( $corpus, 'stopwords' ); my $learner_state = File::Spec->catfile( $corpus, 'learner_state' ); my %params; if (-e $stopwords) { $params{stopword_file} = $stopwords; } else { warn "$stopwords not found - no stopwords will be used.\n"; } if (-e $cats) { $params{category_file} = $cats; } else { die "$cats not found - can't proceed without category information.\n"; } # In a real-world application these Collection objects could be of any # type (any Collection subclass). Or you could create each Document # object manually. Or you could let the KnowledgeSet create the # Collection objects for you. $training = AI::Categorizer::Collection::Files->new( path => $training, %params ); $test = AI::Categorizer::Collection::Files->new( path => $test, %params ); # We turn on verbose mode so you can watch the progress of loading & # training. This looks nicer if you have Time::Progress installed! print "Loading training set\n"; my $k = AI::Categorizer::KnowledgeSet->new( verbose => 1 ); $k->load( collection => $training ); print "Training categorizer\n"; my $l = AI::Categorizer::Learner::NaiveBayes->new( verbose => 1 ); $l->train( knowledge_set => $k ); ### train_line $l->save_state($learner_state); ### store_line my $l2 = AI::Categorizer::Learner::NaiveBayes->restore_state($learner_state); print "Categorizing test set\n"; my $experiment = $l2->categorize_collection( collection => $test ); print $experiment->stats_table; # If you want to get at the specific assigned categories for a # specific document, you can do it like this: my $doc = AI::Categorizer::Document->new ( content => "Hello, I am a pretty generic document with not much to say." ); my $h = $l2->categorize( $doc ); print ("For test document:\n", " Best category = ", $h->best_category, "\n", " All categories = ", join(', ', $h->categories), "\n");
Hi Andreas, I've reproduced this bug. It happens because Algorithm::NaiveBayes->new() performs some initialization (loading the Model class), and when loading a previously-saved instance, the new() method is never called. A workaround is to add an extra call to Algorithm::NaiveBayes->new() in the script after the restore_state() line. I'll look at a way to fix that for a future release. -Ken