Skip Menu |

This queue is for tickets about the Unicode-LineBreak CPAN distribution.

Report information
The Basics
Id: 122104
Status: new
Priority: 0/
Queue: Unicode-LineBreak

People
Owner: Nobody in particular
Requestors: rjbs [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: Needs updating for TR29 revision 29
Unicode 9.0's TR29 has changed the rules for extended grapheme clusters, and this module no longer complies. Most specifically: Show quoted text
> Revised rule GB10 (as of Revision 28) to handle characters of class Extend, > such as variation selectors, in emoji modifier sequences, as may be found in > existing data.
There is a new Grapheme_Cluster_Break property, ZWJ, for ZWJ, and section ยง3.1.1 details its use. Consider this program: use v5.26.0; use warnings; use Unicode::GCString; my $ZWJ = "\N{ZERO WIDTH JOINER}"; my $dude = "\N{MAN}"; my $love = "\N{HEAVY BLACK HEART}\N{U+0FE0F}"; my $kiss = "\N{KISS MARK}"; my $in_love = join $ZWJ, $dude, $love, $kiss, $dude; say length $in_love; my @matches = $in_love =~ /\G(\X)/g; say "M: $_" for @matches; my @gc = split /\b{gcb}/, $in_love; say "S: $_" for @gc; say "GCB: " . @gc; say "COL: " . Unicode::GCString->new($in_love)->columns; say "LEN: " . Unicode::GCString->new($in_love)->length; ---- This demonstrates both the new, correct behavior (from Unicode 9.0 and implemented in perl v5.26.0) of treating a Unicode display cluster. We see that perl treats it as 1 extended grapheme cluster, but Unicode::GCString treats it as four. Presumably, if it treated it as one ECG, it would also count as one column. -- rjbs