Subject: | Core Dump For Large Tokens |
Date: | Sun, 15 Apr 2012 13:42:53 -0400 |
To: | bug-lingua-brilltagger [...] rt.cpan.org |
From: | cpanbt.10.eveland [...] spamgourmet.com |
The Brill Tagger library core dumps with tokens more than 256 characters. If you add:
$text = [ map { substr $_, 0, 250 } @$text ];
Right after the call to tokenize in tag, you won’t hit this. Presumably 255 would work just as well as 250, but I didn’t take the time to fully test and didn’t want to find the exact boundary condition. :)
I’m using Lingua::BrillTagger 0.02 on perl v5.10.1 Linux 2.6.32-35-server #78-Ubuntu SMP Tue Oct 11 16:26:12 UTC 2011 x86_64 GNU/Linux.