Bug #79999 for Convert-ASN1: use UTF-32 for UniversalString and UTF-16 for BMPString (as per X.690)

Thu Oct 04 01:27:23 2012 oneingray [...] gmail.com - Ticket created

CC:	Ivan Shmakov <oneingray [...] gmail.com>
Subject:	use UTF-32 for UniversalString and UTF-16 for BMPString (as per X.690)
Date:	Thu, 04 Oct 2012 12:27:06 +0700
To:	bug-Convert-ASN1 [...] rt.cpan.org
From:	Ivan Shmakov <oneingray [...] gmail.com>

X.680 [1] reads: 37.16 UTF8String is synonymous with UniversalString at the abstract level and can be used wherever UniversalString is used (subject to rules requiring distinct tags) but has a different tag and is a distinct type. NOTE — The encoding of UTF8String used by BER and PER is different from that of UniversalString, and for most text will be less verbose. The X.690 [2] specification (covering BER and DER) states: 8.21.7 For the UniversalString type, the octet string shall contain the octets specified in ISO/IEC 10646-1, using the 4-octet canonical form (see 13.2 of ISO/IEC 10646-1). [...] 8.21.8 For the BMPString type, the octet string shall contain the octets specified in ISO/IEC 10646-1, using the 2-octet BMP form (see 13.1 of ISO/IEC 10646-1). [...] [...] 8.21.10 For the UTF8String type, the octet string shall contain the octets specified in ISO/IEC 10646-1, Annex D. Announcers and escape sequences shall not be used, and each character shall be encoded in the smallest number of octets available for that character. Thus, it's my understanding that the encodings used for UniversalString, BMPString and UTF8String shall be UTF-32 (UCS-4?), UTF-16 (UCS-2?), and UTF-8, respectively (see, e. g., [3].) Contrary to the above, Convert::ASN1 currently (as of 0.26) encodes all of those using UTF-8. Consider, e. g.: $ cat < j14gcqstwotsbqsjauytzxyitn.pl ### j14gcqstwotsbqsjauytzxyitn.pl -*- Perl -*- use strict; use warnings; require Convert::ASN1; require Data::Dump; require IO::Handle; my $asn = Convert::ASN1->new (qw (encoding BER)); $asn->prepare (q { Foo ::= UniversalString Bar ::= BMPString Baz ::= UTF8String }) or die ($!); my $s = "\x{0401}\x{0436}\n"; binmode (\*STDOUT) or die ($!); foreach my $t (qw (Foo Bar Baz)) { my $co = $asn->find ($t) or die ($asn->error ()); my $enc = $co->encode ($s) or die ($co->error ()); print STDOUT ($enc); print STDERR (Data::Dump::dump ($t, length ($enc), $enc), "\n"); } ### j14gcqstwotsbqsjauytzxyitn.pl ends here $ perl -w -- j14gcqstwotsbqsjauytzxyitn.pl \ | od -t x1 -w7 ("Foo", 7, "\34\5\xD0\x81\xD0\xB6\n") ("Bar", 7, "\36\5\xD0\x81\xD0\xB6\n") ("Baz", 7, "\f\5\xD0\x81\xD0\xB6\n") 0000000 1c 05 d0 81 d0 b6 0a 0000007 1e 05 d0 81 d0 b6 0a 0000016 0c 05 d0 81 d0 b6 0a 0000025 $ My guess is that in order to fix the issue, distinct op* types (opUTF32STRING, opUTF16STRING?) should be introduced for the UniversalString and BMPString ASN.1 types to be mapped to (via %base_type): $ nl -ba < Convert/ASN1/parser.pm … 23 24 my %base_type = ( 25 BOOLEAN => [ asn_encode_tag(ASN_BOOLEAN), opBOOLEAN ], … 56 UniversalString => [ asn_encode_tag(ASN_UNIVERSAL | 28), opSTRING ], 57 BMPString => [ asn_encode_tag(ASN_UNIVERSAL | 30), opSTRING ], … $ … Along with the respective _enc_* (Convert/ASN1/_encode.pm) functions. TIA. [1] http://www.itu.int/ITU-T/studygroups/com17/languages/X.680-0207.pdf [2] http://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf [3] http://perldoc.perl.org/perlunicode.html -- FSF associate member #7257

Thu Oct 04 10:04:34 2012 gbarr [...] pobox.com - Correspondence added

Subject:	Re: [rt.cpan.org #79999] use UTF-32 for UniversalString and UTF-16 for BMPString (as per X.690)
Date:	Thu, 4 Oct 2012 09:04:11 -0500
To:	bug-Convert-ASN1 [...] rt.cpan.org
From:	Graham Barr <gbarr [...] pobox.com>

Please report issues via github at https://github.com/gbarr/perl-Convert-ASN1/issues If you have a patch please fork the repository and submit a pull request. Graham.

Thu Oct 04 10:04:35 2012 The RT System itself - Status changed from 'new' to 'open'

Bug #79999 for Convert-ASN1: use UTF-32 for UniversalString and UTF-16 for BMPString (as per X.690)

Preferred bug tracker