Skip Menu |

This queue is for tickets about the perlindex CPAN distribution.

Report information
The Basics
Id: 39863
Status: resolved
Worked: 1.3 hours (80 min)
Priority: 0/
Queue: perlindex

People
Owner: Nobody in particular
Requestors: SREZIC [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: 1.502
Fixed in: (no value)



Subject: perlindex: deleted modules are not removed from indexes
Deleted modules are not removed from the perlindex indexes. Currently the only "workaround" is to remove all index files and re-run perlindex -index from scratch. Having a "delete" function would probably also help in implementing an "update" function, where the indexes are re-built for existing but updated pods (see the other ticket). Regards, Slaven
Hello Slaven, to remove a document from the index, you need either a) remove the document number from all word lists or b) assign a new document number to the document and keep the entries for the deleted document. To (a) remove the document you either i) know all words of the document or ii) scan the whole document list. The first option (i) requires to keep a copy of the original document around - either in the input format or as inverted inverted index. Possibly a combination of (b) and (ii) would be the best option. Documents are deleted by marking them as deleted and in a garbage collecting step all deleted documents are removed in one scan through the document list. This is beyond the small script I wrote for demonstration purpose (to convince Larry to include 'pack "w"') though. Are people using this script for collections big enough to make re-indexing too expensive? Ulrich
As I have implemented the "update" functionality (#39862), delete is possible also. What interface are you proposing? We could remove all files disappeared in an extended garbage collect phase. That would imply a run for every indexing, not only when a file was updated. Alternatively a command line flag could be added. Ulrich
Resolved in perlindex-1.601: After indexing, all files are checked for existence. File not existing any more are removed from the index. This release also fixed the indexing of the default directories broken in 1.600. Note that the 1.6 version breaks index compatibility. Ulrich
*** perlindex.ref2 Sun Oct 19 15:50:57 2008 --- perlindex.PL Sun Oct 19 16:23:04 2008 *************** *** 4,12 **** # -*- Mode: Perl -*- # Author : Ulrich Pfeifer # Created On : Mon Jan 22 13:00:41 1996 ! # Last Modified On: Sun Oct 19 16:26:42 2008 # Language : Perl ! # Update Count : 370 # Status : Unknown, Use with caution! # # (C) Copyright 1996-2005, Ulrich Pfeifer, all rights reserved. --- 4,12 ---- # -*- Mode: Perl -*- # Author : Ulrich Pfeifer # Created On : Mon Jan 22 13:00:41 1996 ! # Last Modified On: Sun Oct 19 16:23:04 2008 # Language : Perl ! # Update Count : 387 # Status : Unknown, Use with caution! # # (C) Copyright 1996-2005, Ulrich Pfeifer, all rights reserved. *************** *** 128,134 **** for $name (@ARGV) { my $fns = $name; $fns =~ s:\Q$prefix/::; ! if ($SEEN{$fns}) { my ($mtime, $did) = unpack "$p$p", $SEEN{$fns}; if ((stat $name)[9] > $mtime) { # mark document as deleted --- 128,134 ---- for $name (@ARGV) { my $fns = $name; $fns =~ s:\Q$prefix/::; ! if (exists $SEEN{$fns}) { my ($mtime, $did) = unpack "$p$p", $SEEN{$fns}; if ((stat $name)[9] > $mtime) { # mark document as deleted *************** *** 149,155 **** --- 149,170 ---- } } } + # Check if all (previuosly) indexed files are still available + # This may take some time. + warn "Validating index ...\n"; + while (my ($fns, $value) = each %SEEN) { + my $path = $fns; $path = $prefix.'/'.$path unless $path =~ m:^/:; + unless (-f $path) { + my ($mtime, $did) = unpack "$p$p", $value; + # mark document as deleted + warn "Marking $did ($fns) as deleted\n"; + delete $FN{$did}; + delete $SEEN{$fns}; + $gc_required++; + } + } if ($gc_required) { + warn "Garbadge collecting ...\n"; # garbadge collection, this is awfully slow while (my ($word,$list) = each %IF) { my %post = unpack($p.'*',$list); *************** *** 200,211 **** $prune = 1; } $fns =~ s:\Q$prefix/::; ! return if $SEEN{$fns}; return unless -f $_; if ($name =~ /man|bin|\.(pod|pm|txt)$/) { if (!/(~|,v)$/) { $did = $FN{'last'}++; ! $SEEN{$fns} = &index($name, $fns, $did); } } } --- 215,242 ---- $prune = 1; } $fns =~ s:\Q$prefix/::; ! ! if (exists $SEEN{$fns}) { ! my ($mtime, $did) = unpack "$p$p", $SEEN{$fns}; ! if ((stat $name)[9] > $mtime) { ! # mark document as deleted ! delete $FN{$did}; ! warn "Marking $did,$mtime ($name) as deleted\n"; ! $gc_required++; ! } else { ! # index up to date ! return; ! } ! } ! return unless -f $_; if ($name =~ /man|bin|\.(pod|pm|txt)$/) { if (!/(~|,v)$/) { $did = $FN{'last'}++; ! if (&index($name, $fns, $did)) { ! my ($mtime) = (stat $name)[9]; ! $SEEN{$fns} = pack "$p$p", (stat $name)[9], $did; ! } } } }