The use bytes version of U::S is significantly slower:
tmon_bytes.out:
Total Elapsed Time = 26.60081 Seconds
User+System Time = 22.13398 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
108. 23.90 28.658 740006 0.0000 0.0000 Unicode::Stringprep::__ANON__
8.07 1.786 1.786 450000 0.0000 0.0000 Unicode::Stringprep::_check_malformed
6.30 1.394 1.394 450000 0.0000 0.0000 Unicode::Normalize::NFKC
5.34 1.183 1.183 550000 0.0000 0.0000 Unicode::Stringprep::_u8_ord
1.25 0.276 0.276 710000 0.0000 0.0000 utf8::decode
0.50 0.110 0.110 900000 0.0000 0.0000 utf8::encode
0.09 0.019 0.023 3 0.0064 0.0078 Unicode::Stringprep::_Common::_mk_map
0.06 0.013 0.052 1 0.0131 0.0519 Unicode::Stringprep::_compile
tmon_utf8.out:
Total Elapsed Time = 5.643093 Seconds
User+System Time = 5.589600 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
81.6 4.563 5.886 450000 0.0000 0.0000 Unicode::Stringprep::__ANON__
23.4 1.312 1.312 450000 0.0000 0.0000 Unicode::Normalize::NFKC
0.34 0.019 0.023 3 0.0063 0.0078 Unicode::Stringprep::_Common::_mk_map
0.21 0.012 0.024 1 0.0120 0.0243 Unicode::Stringprep::_compile
0.13 0.007 0.038 11 0.0006 0.0034 Unicode::Stringprep::BEGIN
0.13 0.007 0.007 511 0.0000 0.0000 Unicode::Stringprep::_compile_mapping_r
0.11 0.006 0.006 1 0.0062 0.0062 utf8::AUTOLOAD
0.07 0.004 0.004 2518 0.0000 0.0000 Unicode::Stringprep::_Common::__ANON__