Skip Menu |

This queue is for tickets about the DBIx-Class-Schema-Loader CPAN distribution.

Report information
The Basics
Id: 123698
Status: open
Priority: 0/
Queue: DBIx-Class-Schema-Loader

People
Owner: Nobody in particular
Requestors: felix.ostmann [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Normal
Broken in: 0.07047
Fixed in: (no value)



Subject: Enums types are not properly create when unicode character is used
The {extra}{list} enum values are not correct encoded. I use the same connection settings for the app itself and all data from the database are correctly encoded except this enum. Show quoted text
> \dT+
... steinhaus_main | enum_tasks_status | enum_tasks_status | 4 | offen +| | | | | erledigt +| | | | | zurückgestellt | ... $ grep status -C5 Tasks.pm ... "status", { data_type => "enum", default_value => "offen", extra => { custom_type_name => "enum_tasks_status", list => ["offen", "erledigt", "zur\xFCckgestellt"], }, is_nullable => 0, }, ... the file is in utf8 with use utf8; in the beginning so i expected: list => ["offen", "erledigt", "zurückgestellt"],
On 2017-11-21 09:54:01, felix.ostmann@gmail.com wrote: Show quoted text
> The {extra}{list} enum values are not correct encoded. I use the same > connection settings for the app itself and all data from the database > are correctly encoded except this enum. > >
> > \dT+
> ... > steinhaus_main | enum_tasks_status | enum_tasks_status | 4 | > offen +| > | | | | > erledigt +| > | | | | > zurückgestellt | > ... > > > $ grep status -C5 Tasks.pm > ... > "status", > { > data_type => "enum", > default_value => "offen", > extra => { > custom_type_name => "enum_tasks_status", > list => ["offen", "erledigt", "zur\xFCckgestellt"], > }, > is_nullable => 0, > }, > ... > > the file is in utf8 with use utf8; in the beginning so i expected: > > list => ["offen", "erledigt", "zurückgestellt"],
These representations of the string are equivalent: $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zurückgestellt"' 1 Schema::Loader uses Data::Dump to serialise method call arguments in the generated files, and it encodes all non-ASCII (and non-printable) characters using \x notation. For aesthetic reasons it might be desirable to output Unicode word characters literally too, but the current output is not incorrect. - ilmari
From: felix.ostmann [...] gmail.com
Am Di 21. Nov 2017, 06:08:27, ilmari schrieb: Show quoted text
> On 2017-11-21 09:54:01, felix.ostmann@gmail.com wrote:
> > The {extra}{list} enum values are not correct encoded. I use the same > > connection settings for the app itself and all data from the database > > are correctly encoded except this enum. > > > >
> > > \dT+
> > ... > > steinhaus_main | enum_tasks_status | enum_tasks_status | 4 > > | > > offen +| > > | | | > > | > > erledigt +| > > | | | > > | > > zurückgestellt | > > ... > > > > > > $ grep status -C5 Tasks.pm > > ... > > "status", > > { > > data_type => "enum", > > default_value => "offen", > > extra => { > > custom_type_name => "enum_tasks_status", > > list => ["offen", "erledigt", "zur\xFCckgestellt"], > > }, > > is_nullable => 0, > > }, > > ... > > > > the file is in utf8 with use utf8; in the beginning so i expected: > > > > list => ["offen", "erledigt", "zurückgestellt"],
> > These representations of the string are equivalent: > > $ perl -Mutf8 -E 'say "zur\xFCckgestellt" eq "zurückgestellt"' > 1 > > Schema::Loader uses Data::Dump to serialise method call arguments in > the generated files, and it encodes all non-ASCII (and non-printable) > characters using \x notation. > > For aesthetic reasons it might be desirable to output Unicode word > characters literally too, but the current output is not incorrect. > > - ilmari
It is not really the same ... In the real code i have to make a Encode::decode('ISO-8859-15', $enum) as a quickfix. $ cat ticket123698.pl use utf8; use 5.20.0; use Data::Dumper; say "zur\xFCckgestellt" eq "zurückgestellt"; print Dumper("zur\xFCckgestellt","zurückgestellt"); $ perl ticket123698.pl 1 $VAR1 = 'zur�ckgestellt'; $VAR2 = "zur\x{fc}ckgestellt";
Subject: Re: [rt.cpan.org #123698] Enums types are not properly create when unicode character is used
Date: Tue, 21 Nov 2017 12:07:41 +0000
To: "Felix Antonius Wilhelm Ostmann via RT" <bug-DBIx-Class-Schema-Loader [...] rt.cpan.org>
From: ilmari [...] ilmari.org (Dagfinn Ilmari Mannsåker)
"Felix Antonius Wilhelm Ostmann via RT" <bug-DBIx-Class-Schema-Loader@rt.cpan.org> writes: Show quoted text
> It is not really the same ...
The _internal_ representation is not the same; the \x from will be represented internally as one byte per code point ("downgraded"), while the literal form will be utf-8-encoded ("upgraded"). Semantically they are the same, as evidenced by "eq" returning true. Show quoted text
> In the real code i have to make a Encode::decode('ISO-8859-15', $enum) as a quickfix.
Please show where in the real code you have to do this. It smells like something you're passing it to suffering from the Unicode Bug, i.e. treating the characters in the 128..255 range differently depending on the internal representation (see https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for details). Show quoted text
> $ cat ticket123698.pl > use utf8; > use 5.20.0; > use Data::Dumper; > say "zur\xFCckgestellt" eq "zurückgestellt"; > print Dumper("zur\xFCckgestellt","zurückgestellt"); > $ perl ticket123698.pl > 1 > $VAR1 = 'zur�ckgestellt'; > $VAR2 = "zur\x{fc}ckgestellt";
The different outputs here are a quirk of how Data::Dumper deals with downgraded vs. upgraded strings (which could be viewed as an instance of the Unicode Bug, but doesn't actually affect semantics). The first one is showing as � because you haven't thold perl that your terminal expects UTF-8-encoded strings. Adding use open qw(:std :utf8); to the script will make it apply a UTF-8 encoding layer to the standard input/output/error filehandles, so non-ASCII charcters show correctly. - ilmari -- "I use RMS as a guide in the same way that a boat captain would use a lighthouse. It's good to know where it is, but you generally don't want to find yourself in the same spot." - Tollef Fog Heen
From: felix.ostmann [...] gmail.com
Am Di 21. Nov 2017, 07:07:59, ilmari@ilmari.org schrieb: Show quoted text
> "Felix Antonius Wilhelm Ostmann via RT" > <bug-DBIx-Class-Schema-Loader@rt.cpan.org> writes: >
> > It is not really the same ...
> > The _internal_ representation is not the same; the \x from will be > represented internally as one byte per code point ("downgraded"), > while > the literal form will be utf-8-encoded ("upgraded"). Semantically they > are the same, as evidenced by "eq" returning true. >
> > In the real code i have to make a Encode::decode('ISO-8859-15', > > $enum) as a quickfix.
> > Please show where in the real code you have to do this. It smells > like > something you're passing it to suffering from the Unicode Bug, > i.e. treating the characters in the 128..255 range differently > depending > on the internal representation (see > https://metacpan.org/pod/perlunicode#The-%22Unicode-Bug%22 for > details). >
> > $ cat ticket123698.pl > > use utf8; > > use 5.20.0; > > use Data::Dumper; > > say "zur\xFCckgestellt" eq "zurückgestellt"; > > print Dumper("zur\xFCckgestellt","zurückgestellt"); > > $ perl ticket123698.pl > > 1 > > $VAR1 = 'zur�ckgestellt'; > > $VAR2 = "zur\x{fc}ckgestellt";
> > The different outputs here are a quirk of how Data::Dumper deals with > downgraded vs. upgraded strings (which could be viewed as an instance > of > the Unicode Bug, but doesn't actually affect semantics). The first > one > is showing as � because you haven't thold perl that your terminal > expects UTF-8-encoded strings. Adding > > use open qw(:std :utf8); > > to the script will make it apply a UTF-8 encoding layer to the > standard > input/output/error filehandles, so non-ASCII charcters show correctly. > > - ilmari
OK, here is the real world scenario with pseudo code. I am using DBIx::Class + Catalyst + Template Toolkit ResultSet: sub enum_status { my ($self) = @_; # FIXME see https://rt.cpan.org/Public/Bug/Update.html?id=123698 return map { Encode::decode("ISO-8859-15", $_) } @{ $self->result_source->column_info('status')->{extra}->{list} }; return @{ $self->result_source->column_info('status')->{extra}->{list} }; } Catalyst-Controller: $c->stash->{status_order} = [ $rs->enum_status ]; Template: [% FOREACH status IN status_order %] <a href="[% c.request.uri_with({status => status}) %]"> [% END %] Without the FIXME the links are ISO-8859-15 After reading your reply and docs about unicode-Bug i changed the code to the following: __PACKAGE__->column_adds( ... { data_type => "enum", default_value => "offen", extra => { custom_type_name => "enum_tasks_status", list => ["offen", "erledigt", "zur\xFCckgestellt"], }, is_nullable => 0, }, ... ); ... # DO NOT MODIFY THIS OR ANYTHING ABOVE! md5sum:W4KhHAXiEW35h5XWiZwhFg utf8::upgrade($_) for @{ __PACKAGE__->column_info('status')->{extra}->{list} }; But in my option this is kind of a bug. Why are all other strings comming from the database already upgraded but not this?