On Thu Jun 26 16:13:10 2014, apostole@gmail.com wrote:
Show quoted text>[...]Furthermore, logically, if I have a
> word transliterated, for every character in the new word,
> Character.isLetter() should return True.
Not only is that not a goal of mine, the documentation
says that *that specifically* is something you can't assume.
Section "DESIGN GOALS AND CONSTRAINTS":
«
For example, if you assume an all-alphabetic (Unicode) string passed
to unidecode(...) will return an all-alphabetic string, you're wrong--
some alphabetic Unicode characters are transliterated as strings
containing punctuation (e.g., the Armenian letter "Թ" (U+0539),
currently transliterates as "T`" (capital-T then a backtick).
»
As to "@" for schwa, that's a convention that me and other
linguists have used when we've needed to do pseudo-IPA in 7-bit.
I didn't make it up from nothing-- and I think I'll leave it the
way it is, because...
Show quoted text> Other thing that bothers me is the transliteration to numerical
...we see things differently. You, I, and other users would choose
different approaches to transliteration for particular blocks.
In *many* cases, I went for graphic similarity, hence Ǝ → 3.
It sounds like we have different philosophies for U+01xx and U+02xx.
See the documentation "WHEN YOU DON'T LIKE WHAT UNIDECODE DOES".
As HL Mencken one said: You may be right.
Anything worth doing right, is worth you doing right the way you like it.
And then pass off to Unidecode to do cleanup if it has any Malayalam, or
Greek, or Tibetan, or fullwidth characters, etc.
Unicode is big enough that everyone will find some part of Unidecode
that seems totally wrongheaded. (Many of them are Chinese and they are
very angry... and wonderfully contradictory. I wish I could introduce
them all to eachother.)
Show quoted text> P.S. Are the tables in anyway managed? Can I get insight of how they are
> made and maintained?
I made them, and I manage them locally. For insight, read all
the new documentation, and also the Perl Journal article about it:
http://interglacial.com/tpj/22/