Subject: | Indexing bug for large source trees |
For large source trees (200k lines) the implicit btree index on words(word) via the unique constraint can make postgres barf.
I added support for optionally hash indexing it instead via this patch.
Subject: | Kasago.patch |
diff --git a/CHANGES b/CHANGES
index c7aaa13..46c1059 100644
--- a/CHANGES
+++ b/CHANGES
@@ -1,4 +1,6 @@
CHANGES file for Kasago:
+0.3 Tue Sep 3
+ Add support for hash indexing on words(word)
0.29 Tue Jul 26 14:32:05 BST 2005
- first release
\ No newline at end of file
diff --git a/lib/Kasago.pm b/lib/Kasago.pm
index 687a455..9abd51d 100644
--- a/lib/Kasago.pm
+++ b/lib/Kasago.pm
@@ -12,7 +12,7 @@ use PPI;
use Search::QueryParser;
use base qw( Class::Accessor::Chained::Fast );
__PACKAGE__->mk_accessors(qw( dbh ));
-our $VERSION = '0.29';
+our $VERSION = '0.3';
sub new {
my $class = shift;
@@ -31,7 +31,7 @@ sub DESTROY {
}
sub init {
- my $self = shift;
+ my ($self, $index_type) = @_;
my $dbh = $self->dbh;
eval {
@@ -65,12 +65,20 @@ CREATE TABLE files (
CREATE INDEX source_id_index ON files(source_id);
");
- $dbh->do("
+ my $words_table = "
CREATE TABLE words (
word_id SERIAL PRIMARY KEY,
- word TEXT UNIQUE
+ word TEXT ";
+ if ($index_type eq 'hash') {
+ $words_table = ."
) WITHOUT OIDS;
-");
+CREATE INDEX words_word ON words USING hash (word);
+";
+ }
+ else {
+ $words_table = " UNIQUE) WITHOUT OIDS;";
+ }
+ $dbh->do($words_table);
$dbh->do("
CREATE TABLE lines (
@@ -552,12 +560,15 @@ You pass a source name and the directory path:
$kasago->import($source, $dir);
-=head2 init
+=head2 init ($index_type)
-This created the tables needed by Kasago in the database. You only need run this
-once. If you run this after initialisation, it will delete the index.
-
- $kasago->init;
+This created the tables needed by Kasago in the database. You only need run
+this once. If you run this after initialisation, it will delete the index.
+If $index_type eq 'hash' then a hash based index will be created on
+words(word). Otherwise an implicit btree index will be created. For large
+codebases, postgres can complain about index size for the btree index. The
+hash index fixes this, but at the expense of only being useful for equality
+operators. $kasago->init;
=head2 search