Subject: | IDF computation improvement |
According to https://en.wikipedia.org/wiki/Tf%E2%80%93idf#Inverse_document_frequency_2 the IDF can be computed without the addition of 1 to the count. So, https://metacpan.org/source/LMETCALF/Text-TFIDF-0.03/lib/Text/TFIDF.pm#L70 could be changed to this:
return - log( $count / scalar( keys %{ $t->{file} } ) ) / log(10);
This makes the wikipedia examples work out nicely. :-)
-Gene