Subject: | WebService::GData::YouTube doesn't decode unicode in video titles |
#!/usr/bin/perl
use strict;
use warnings;
use WebService::GData::YouTube;
my $id = 'pbA-AP0ob0s';
my $yt = WebService::GData::YouTube->new;
my $vid = $yt->get_video_by_id($id);
my $title = $vid->title;
printf "%vd\n%s\n", $title, $title;
__END_
Running this program gives me the following output:
91.70.79.82.77.49.50.48.49.51.93.32.65.110.100.114.101.97.115.32.75.114.195.164.109.101.114.32.38.32.84.104.111.109.97.115.32.80.111.103.97.100.108.32.8206.45.32.68.101.114.32.82.195.164.99.104.101.114
[FORM12013] Andreas Krämer & Thomas Pogadl - Der Rächer
That is, $title contains both undecoded UTF-8 bytes (195 164) and
Unicode (non-ASCII) characters (8206).
The undecoded UTF-8 means you can't use it as text (it will appear
broken, two characters (ä) where there should be one); the non-ASCII
character means you can't decode it manually.
This is caused by WebService::GData::YouTube passing undecoded bytes to
JSON::from_json, which is defined as taking a string (not bytes). It
should either decode the bytes itself, or call JSON::decode_json instead
(after making sure the content is actually UTF-8).