Subject: | broken punycode handling |
Date: | Tue, 14 May 2013 22:28:31 +0200 |
To: | bug-rdf-rdfa-generator [...] rt.cpan.org |
From: | Jonas Smedegaard <dr [...] jones.dk> |
It seems punycode is handled wrongly.
This script:
#!/usr/bin/perl
use HTML::HTML5::Builder qw[:standard];
use RDF::RDFa::Generator;
use RDF::TrineX::Functions qw[parse];
my $data = <<'DATA';
[] <http://xmlns.com/foaf/0.1/Homepage>
<http://www.xn--hestebedgrd-58a.dk/>.
DATA
my $model = parse($data, as => 'Turtle', base => 'http://example.org/');
my $gen = RDF::RDFa::Generator->new(style=>'HTML::Pretty');
print RDF::RDFa::Generator->create_document($model)
emits this (the first line being to stderr):
error : string is not in UTF-8
<?xml version="1.0"?>
<html xmlns="http://www.w3.org/1999/xhtml" version="XHTML+RDFa 1.0">
<head profile="http://www.w3.org/1999/xhtml/vocab">
<title>RDFa Document</title>
<meta name="generator" value="RDF::RDFa::Generator::HTML::Head"/>
<link xmlns:foaf="http://xmlns.com/foaf/0.1/" about="[_:r1368562720r0]"
rel="foaf:Homepage" resource="http://www.hestebedg岤.dk/"/></head>
<body/>
</html>
The embedded http://www.hestebedg岤.dk/ is garbled!
Seems RDF::Trine decodes the punycode, but RDF::RDFa::Generator does not
encode again but instead chokes when stumbling on non-ASCII character.
If calling ->nodes() instead of ->create_document(), then this error is
emitted:
Redland error: XML tree error: string is not in UTF-8
- Jonas
--
* Jonas Smedegaard - idealist & Internet-arkitekt
* Tlf.: +45 40843136 Website: http://dr.jones.dk/
[x] quote me freely [ ] ask before reusing [ ] keep private