Skip Menu |

This queue is for tickets about the Text-Match-FastAlternatives CPAN distribution.

Report information
The Basics
Id: 86387
Status: new
Priority: 0/
Queue: Text-Match-FastAlternatives

People
Owner: Nobody in particular
Requestors: rglee32 [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Complexity of string construction is O(ML^2)
Consider this short program: ### use Text::Match::FastAlternatives; my @dict = (); for my $i (0 .. 1999) { push @dict, ('b' x $i) . ('a' x (2000 - $i)); } my $aho = Text::Match::FastAlternatives->new(@dict); ### There are 2K strings, each of length 2K. One would expect O(ML) construction to take at most a couple of seconds (where M = number of strings, L = length of strings), however this runs for a substantial amount of time with the last line included; when it is excluded the runtime of the program is negligible. This is due to the use of longest_suffix to determine the failure function for each vertex. When the number of vertices is O(ML), as here, and the time complexity of longest_suffix is O(L), this leads to O(ML^2) overall complexity. The effect is just as pronounced with a dictionary containing a single string, ('a' x 1000000). One of the key points in the KMP/Aho-Corasick algorithm is that longest_suffix can be computed in constant amortised time by exploiting substructure in the dictionary. See http://wikipedia.org/wiki/Knuth-Morris-Pratt_algorithm for more information.