Subject: | Non-Unicode encoding support (patches included) |
Proposal for non-Unicode encoding support (included patches for
v.0.31..0.34).
If we need to use non-Unicode/non-Latin encodings, we have to support it
in 2 ways:
1) Template file in non-Unicode encoding - this is easily achieved by
proper heading in XML file, e.g.:
<?xml version='1.0' encoding='windows-1251'?>
This requires no code changes, it is handled by XML parser transparently.
2) Support for variable substitution if variables contain strings in
non-Unicode encodings.
To solve this task the patches are proposed:
Var.pm.patch - to support variable substitution <VAR NAME='param'/> if
it contains text in specific encoding
Context.pm - to similarly handle implicit substitution like <WORKSHEET
NAME='$VaraibleName'/>
To use new functionality, we need to specify attribute
VAR_ENC='<encoding-name>' somewhere within the current context (from
this variable up to entire worksheet). I personally work with Cyrillic
windows-1251 encoding both in template files and variables, so my
typical template starts with:
<?xml version='1.0' encoding='windows-1251'?>
<workbook>
<worksheet name='SomeExcelTable' var_enc='windows-1251'>
......
The attribute name is not too pretty but reminds of what is actually
encoded.
The patch for Var.pm is tested and works for me. The patch for
Context.pm probably needs some testing (I'm not sure I've tested it).
Decoding itself is performed by Encode, so I suppose Perl versions 5.6+
and probably earlier should support it. The 'use Encode' line should
probably be placed somewhere else and made dependent of Perl version
(the patches are actually just a quick hack to minimize patching).
Would be great to include this support into next release.
Subject: | Var.pm.patch |
12c12,22
< sub resolve { ($_[1])->param($_[1]->resolve($_[0], 'NAME')) }
---
> sub resolve { # - support for VAR value encoding
> my $self = shift ; #
> my ($context) = @_ ; #
> my $val = $context->param($context->resolve($self, 'NAME')); #
> my $enc = $context->get($self, "VAR_ENC") ; #
> if ($enc) { #
> require Encode ; #
> $val = Encode::decode( $enc, $val ) ; #
> } #
> return $val ; #
> } #
Subject: | Context.pm.patch |
100,101c100,107
< $obj_val = $self->param($1)
< if $obj_val =~ /^\$(\S+)$/o;
---
> if ($obj_val =~ /^\$(\S+)$/o) { # - to support variable substitution
> $obj_val = $self->param($1) ; # - in constructions like this:
> my $enc = $self->get($obj, "VAR_ENC") ; # <var name="$SomeParam"/>
> if ($enc) { # - (in case the variable contains text in non-Unicode encoding)
> require Encode ; #
> $obj_val = Encode::decode( $enc, $obj_val ) ; #
> } #
> } #