Subject: | Patterns list wrong buildup |
Date: | Fri, 11 Apr 2008 10:59:55 +0200 |
To: | bug-Games-Cryptoquote [...] rt.cpan.org |
From: | "Walter Baeck" <walter.baeck [...] gmail.com> |
I haven't actually run CryptoQuote.pm ; I was just browsing through the code
to understand its approach.
From the included patterns.txt file, I get the general idea of how the
algorithm works.
Based on whether letters are recurring or unique in a word, a lookup key is
formed that allows quick
access to a list of known plaintext words of exactly this same pattern.
But these lookup keys treat upper/lower case letters as different, which
shouldn't be the case.
I'm used to CryptoQuotes printed in the newspaper in all-uppercase, so I
never thought of the issue.
But from the example in your own source code, I understand that lowercase
and uppercase
substitutions are meant to be consistent (when an uppercase 'B' stands for
an 'N', then automatically
the lowercase 'b' is also guaranteed to stand for an 'n' - and vice versa).
Therefore, the word encodings should be classified regardless of
uppercase/lowercase, and the
patterns.txt file should be built up as such. While representing the found
solution, for esthetic
correctness, the casing could be retrieved from the encoded quote, and
reproduced.
Perhaps it is the intention to restrict proper names from matching against
lowercase codes within
the quote itself. (I think this is a dangerous idea, because common words
can also occur with a
capital in them, at the beginning of a sentence within the quote.) But then
still, information is lost
by blandly considering the uppercase codes as wholly different. The coded
author's first name
"Npn" should match "Ada" or "Bob", but not "Cat".
Walter