Subject: | should use Script_Extensions property for mixed_script |
Mixed script detection should be using Script_Extensions, not Script (charscript) to compute soss, and thus mixed script-y-ness.
Some characters are in multiple scripts. For example, this script should not be mixed script:
qq(\x{a81b}\x{a80d}\x{a80e} \x{09EA})
The first four characters are Sylo. The last one has script Bengali, but has Script_Extensions=Sylo, and so the string can be construed as entirely Sylo + Common.
I could write a patch if you're no longer working on this.
See http://www.unicode.org/reports/tr39/#Mixed_Script_Detection
--
rjbs