Skip Menu |

This queue is for tickets about the Crypt-XkcdPassword CPAN distribution.

Report information
The Basics
Id: 74684
Status: resolved
Priority: 0/
Queue: Crypt-XkcdPassword

People
Owner: perl [...] toby.ink
Requestors: NEILB [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.002
Fixed in: (no value)



Subject: invalid words in the wordlist
The wordlist for EN contains the following, which aren't valid words: nbsp mmm cristian livvie ohh stenbeck marah รบ ahh uhh snyder pheebs will's goa'uld hmmm teal'c oooh feds uhm ooo cortlandt asa's awright I haven't listed all the bad words -- gave up after finding these :-)
On 2012-02-04T23:45:32Z, NEILB wrote: Show quoted text
> The wordlist for EN contains the following, which aren't valid words
Words are taken from http://en.wiktionary.org/wiki/ Wiktionary:Frequency_lists/TV/2006/explanation - the first 10,000 words. The latest release includes documentation of how to combine Crypt::XkcdPassword with Text::Aspell to filter out such words. https:// metacpan.org/module/TOBYINK/Crypt-XkcdPassword-0.003/lib/Crypt/ XkcdPassword/Examples.pod I'm leaving this issue open for now, as a better solution is to distribute a better English dictionary. Roget's thesaurus is on Project Gutenberg, which might be a good source.
Show quoted text
> I'm leaving this issue open for now, as a better solution is to > distribute a better English dictionary. Roget's thesaurus is on Project > Gutenberg, which might be a good source.
There's now a Roget-based word list in the repo. Many of the words in it though are rather obscure. Quite a lot of them appear to be loan-words from other languages that I wasn't aware we'd even borrowed (e.g.abundanti). Next step I suppose is to calculate the intersection of the two lists.
0.004 includes EN::Roget.