Skip Menu |

This queue is for tickets about the Template-Toolkit CPAN distribution.

Report information
The Basics
Id: 57466
Status: rejected
Priority: 0/
Queue: Template-Toolkit

People
Owner: Nobody in particular
Requestors: dmuey [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Wishlist
Broken in: (no value)
Fixed in: (no value)



Subject: Patch to support \x{263A} notation in double quoted strings
Hello, In perl I can do (w/ "Wide character" warnings typically, but that isn't really relevant at this point): print "Smile \x{263A}\n"; and will get Smile ☺ (hopefully rt doesn't corrupt the smiley face that should follow 'Smile') It'd be great to be able to do the same with double quoted strings in TT. (without having to override it and set PARSER) [% "Smile \x{263A}" %] The attached patch makes that work, without "Wide character" warnings, all the way back to perl 5.6 Thanks, TT rocks! -- Dan Muey [ -- Example -- ] Same results on 5.8.9 and 5.6.2 [ -- Before patch -- ] hal9000$ perl -Mstrict -MTemplate -wle 'my $tt=Template->new;$tt->process(\qq([% "$ARGV[0]" %]));' 'Smile \x{263A}' Smile x{263A} hal9000$ [ -- After patch -- ] hal9000$ perl -Mstrict -MTemplate -wle 'my $tt=Template->new;$tt->process(\qq([% "$ARGV[0]" %]));' 'Smile \x{263A}' Smile ☺ hal9000$ (hopefully rt doesn't corrupt the smiley face that should follow 'Smile')
Subject: add_slash_x_hex_support_to_TT_interpolation_of_double_quote_context.patch
--- Parser.pm.orig 2010-05-13 09:10:48.000000000 -0500 +++ Parser.pm 2010-05-13 12:08:40.000000000 -0500 @@ -513,6 +513,7 @@ # is set and undef is returned. #------------------------------------------------------------------------ +my $has_encode; # necessary for tokenise_directive() to handle \x{263A} notation sub tokenise_directive { my ($self, $text, $line) = @_; my ($token, $uctoken, $type, $lookup); @@ -568,6 +569,8 @@ # quoted string if (defined ($token = $3)) { # double-quoted string may include $variable references + $self->{'ENCODE_HEX_AS'} ||= 'utf-8'; # necessary for tokenise_directive() to handle \x{263A} notation + if ($2 eq '"') { if ($token =~ /[\$\\]/) { $type = 'QUOTED'; @@ -576,6 +579,21 @@ # as a variable reference # $token =~ s/\\([\\"])/$1/g; for ($token) { + # necessary for tokenise_directive() to handle \x{263A} notation + # only do regex once, only do require once (i.e. if Encode is not available why try to require it for each token?) + if (!defined $has_encode && m/\\x\{[0-9a-f]{1,4}\}/i) { + $has_encode = 0; + eval { require Encode; $has_encode = 1; }; + } + if ( $has_encode ) { # perl 5.007003 and up + s{\\x\{([0-9a-f]{1,4})\}}{Encode::encode($self->{'ENCODE_HEX_AS'}, chr(hex("$1")))}egi; + } + else { # i.e. perl 5.6 + # It could be argued that this only needs done when hex($_) < 0x100 but it works so leave it like this for consistency and in case it is needed under specific circumstances + s{\\x\{([0-9a-f]{1,4})\}}{eval '"\x{'."$1".'}"'}egi; + } + # /necessary for tokenise_directive() to handle \x{263A} notation + s/\\([^\$nrt])/$1/g; s/\\([nrt])/$QUOTED_ESCAPES->{ $1 }/ge; }
Of note, the 'ENCODE_HEX_AS' key might be in the wrong object or wrong part of the object. It needs to be settable like other “Template Style and Parsing Options”. If it needs done elswhere/differently, let me know and I'll address it, thanks!
2 more items I can address if you all decide to incorporate this patch: #1 has to be done #2 could be dropped as a "it's not meant to be 100% the same as perl" type thing. 1) handle characters represented by multiple \x{} sequences: \x{e3}\x{80}\x{b9} == 〹 2) Support '\x without brackets' notation (also multiple \xNN sequences): \xC2\xAE == ® \xE2\x80\x9Chowdy\xE2\x80\x9D == “howdy”
Actually the patch is good as-is I think. I believe documentation would suffice. The reasons I make that statement appear inline below: On Thu May 20 14:49:00 2010, DMUEY wrote: Show quoted text
> 2 more items I can address if you all decide to incorporate this patch: > > #1 has to be done #2 could be dropped as a "it's not meant to be 100% > the same as perl" type thing.
On second thought, the patch currently is very simple and supporting all of that makes the code much more complex (i.e. slower) (plus it makes it less ambiguous and more readable when you look at the template source). I'd say as long as the docs say something like this: Hexidecimal interpolation support is intended to be simple not comprehensive. To keep it as simple as possible and avoid ambiguity it only supports the single-slash-x-bracketed-notation-that-stands-for-a-single-character format. For example, if you want chr(12345) (i.e. 〹) you can do \x{3039} but you can't do \x{e3}\x{80}\x{b9}. Also note it does not support non-bracketed \x notation. For example, you can't do "\xC2\xAE" for "®" (you'd use "\x{00AE}") or "\xE2\x80\x9Chowdy\xE2\x80\x9D" for "“howdy”" (you'd use "\x{201C}howdy\x{201D}")
I'll just stick with byte-string-character-itself instead of this, thanks