Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the RDF-Trine CPAN distribution.

Report information
The Basics
Id: 85297
Status: resolved
Priority: 0/
Queue: RDF-Trine

People
Owner: gwilliams [...] cpan.org
Requestors: dr [...] jones.dk
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: broken punycode handling
Date: Tue, 14 May 2013 22:28:31 +0200
To: bug-rdf-rdfa-generator [...] rt.cpan.org
From: Jonas Smedegaard <dr [...] jones.dk>
It seems punycode is handled wrongly. This script: #!/usr/bin/perl use HTML::HTML5::Builder qw[:standard]; use RDF::RDFa::Generator; use RDF::TrineX::Functions qw[parse]; my $data = <<'DATA'; [] <http://xmlns.com/foaf/0.1/Homepage> <http://www.xn--hestebedgrd-58a.dk/>. DATA my $model = parse($data, as => 'Turtle', base => 'http://example.org/'); my $gen = RDF::RDFa::Generator->new(style=>'HTML::Pretty'); print RDF::RDFa::Generator->create_document($model) emits this (the first line being to stderr): error : string is not in UTF-8 <?xml version="1.0"?> <html xmlns="http://www.w3.org/1999/xhtml" version="XHTML+RDFa 1.0"> <head profile="http://www.w3.org/1999/xhtml/vocab"> <title>RDFa Document</title> <meta name="generator" value="RDF::RDFa::Generator::HTML::Head"/> <link xmlns:foaf="http://xmlns.com/foaf/0.1/" about="[_:r1368562720r0]" rel="foaf:Homepage" resource="http://www.hestebedg&#x5CA4;.dk/"/></head> <body/> </html> The embedded http://www.hestebedg&#x5CA4;.dk/ is garbled! Seems RDF::Trine decodes the punycode, but RDF::RDFa::Generator does not encode again but instead chokes when stumbling on non-ASCII character. If calling ->nodes() instead of ->create_document(), then this error is emitted: Redland error: XML tree error: string is not in UTF-8 - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
On 2013-05-14T21:36:27+01:00, dr@jones.dk wrote: Show quoted text
> It seems punycode is handled wrongly.
The problem seems to be at least partly in Trine. See attached script.
Subject: punycode-trine.pl
use RDF::Trine; use RDF::TrineX::Functions qw[parse serialize]; my $model = "RDF::Trine::Model"->temporary_model; parse \*DATA, into => $model, using => "RDF::Trine::Parser::Turtle"->new, base => "http://localhost/"; serialize $model, using => "RDF::Trine::Serializer::Turtle"->new, to => \*STDOUT; __DATA__ <x> <y> <http://www.xn--hestebedgrd-58a.dk/>.
OK, refactored test case to eliminate RDF::TrineX::Functions from the equation, and use Test::More. I'm going to forward this issue to RDF-Trine.
Subject: punycode-trine.t
use strict; use warnings; use Test::More tests => 1; use RDF::Trine; my $model = "RDF::Trine::Model"->temporary_model; my $parser = "RDF::Trine::Parser::Turtle"->new; my $ser = "RDF::Trine::Serializer::Turtle"->new; $parser->parse_file_into_model("http://localhost/", \*DATA, $model); my $output = $ser->serialize_model_to_string($model); like($output, qr{www\.xn--hestebedgrd-58a\.dk}); __DATA__ <x> <y> <http://www.xn--hestebedgrd-58a.dk/>.
Subject: Re: [rt.cpan.org #85297] broken punycode handling
Date: Wed, 15 May 2013 12:15:16 +0200
To: bug-RDF-RDFa-Generator [...] rt.cpan.org
From: Jonas Smedegaard <dr [...] jones.dk>
Quoting Toby Inkster via RT (2013-05-15 11:21:47) Show quoted text
> <URL: https://rt.cpan.org/Ticket/Display.html?id=85297 > > > On 2013-05-14T21:36:27+01:00, dr@jones.dk wrote:
> > It seems punycode is handled wrongly.
> > The problem seems to be at least partly in Trine. > > See attached script.
Indeed! - Jonas -- * Jonas Smedegaard - idealist & Internet-arkitekt * Tlf.: +45 40843136 Website: http://dr.jones.dk/ [x] quote me freely [ ] ask before reusing [ ] keep private
Reported on github.com issue tracker... https://github.com/kasei/perlrdf/issues/85
On Wed May 15 08:32:10 2013, TOBYINK wrote: Show quoted text
> Reported on github.com issue tracker... > https://github.com/kasei/perlrdf/issues/85
As noted on github: Commit e20f560 (on the unicode-bugfix branch) should have now resolved all of the issues associated with this bug. Punycode is no longer decoded by parsers, and URI base resolution works with full IRIs. The only thing to note now is that the N-Triples serializer will decode punycode URIs and emit them as unicode escaped IRIs. For example, the URI 'http://www.xn--hestebedgrd-58a.dk/' will be emitted as the N-Triples <http://www.hestebedg\u00E5rd.dk/>. This is probably a good thing as it encourages the use of proper IRIs instead of letting punycode into the wild.