Skip Menu |

This queue is for tickets about the MP3-ID3Lib CPAN distribution.

Report information
The Basics
Id: 36020
Status: open
Priority: 0/
Queue: MP3-ID3Lib

People
Owner: Nobody in particular
Requestors: GLEACH [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: (no value)



Subject: UNICODE encoding of fields not handled correctly
Tags encoded in UNICODE (I think!) are not reported correctly. For example, a file that has T P E 1 \0 0000100 \0 \0 % \0 \0 001 377 376 N \0 o \0 r \0 t \0 0000120 h \0 e \0 r \0 n \0 \0 A \0 l \0 l \0 0000140 i \0 a \0 n \0 c \0 e \0 (linux od -c) gets TPE1: �o8 when the field is requested and printed. Additionally, field->set followed by commit has no effect. The id3info program distributed with id3lib reports the content correctly: === TPE1 (Lead performer(s)/Soloist(s)): Northern Alliance. I believe the reason for this is that id3info is requesting a UNICODE encoding, whereas MP3::ID3lib is using ASCII. I'd be happy to fix this if some pointers could be provided. As it is, I get lost in the c++ ID3Lib/Frame.pm version .10, perl v5.8.8, fedora 8
On Mon May 19 19:06:06 2008, GLEACH wrote: Show quoted text
> Tags encoded in UNICODE ...
I see that the same problem was rejected when reported by 4536. Hopefully this can be reconsidered.
From: TJC [...] cpan.org
Confirmed it seems to exist. Attaching minimal (0 seconds long) MP3 file with ID3 v2.3 tag containing Unicode for demonstration purposes.
Download empty.mp3
audio/mpeg 165b

Message body not shown because it is not plain text.

On Tue May 20 00:51:31 2008, TJC wrote: Show quoted text
> Confirmed it seems to exist. > Attaching minimal (0 seconds long) MP3 file with ID3 v2.3 tag containing > Unicode for demonstration purposes.
Actually, please ignore the file. It contains UTF-8 Unicode data, not UTF-16. The v2.3 ID3 spec says UTF-16 or ISO-8859-1 text is required, however v2.4 uses UTF-8, hence my confusion. MP3::ID3Lib only supports v2.3 though due to the underlying C library.
From: GLEACH [...] cpan.org
On Tue May 20 01:44:20 2008, TJC wrote: Show quoted text
> On Tue May 20 00:51:31 2008, TJC wrote:
> > Confirmed it seems to exist.....
A really cool enhancement would be UTF-8 <=> UTF-16 ....
From: TJC [...] cpan.org
I had an attempt to resolve this in the XS with C, and then another attempt to resolve it by using Encode::{encode,decode} in perl instead, both times I didn't manage to get satisfactory results. My XS+Unicode experience is minimal, esp. when UTF-16 is thrown in, so I'll have to leave this bug to someone to solve. If you're looking for a quick solution, have you considered Audio::TagLib? It seems to have v2.4 ID3 tag and Unicode support.
From: GLEACH [...] cpan.org
Attached is a C program that believes that it can handle both UNICODE (actually UCS-2) and ASCII encoded ID3 files. Any comments would be appreciated.
#include "id3.h" #include "stdio.h" void to_ascii(unicode_t *in, size_t len, char* out) { if (*in == 0xfeff ) { in++; len-=2; while ( len > 0 ) { *out = *in & 0xff; out++; in++; len-=2; } } else { while ( len > 0 ) { *out = *in >> 8; out++; in++; len-=2; } } *out = '\0'; } int main (int *arg, char **argv) { ID3Tag *tag = ID3Tag_New(); ID3Tag_Link(tag, argv[1]); ID3TagIterator* iterator = ID3Tag_CreateIterator(tag); ID3Frame* frame; while ((frame = ID3TagIterator_GetNext(iterator)) != NULL) { ID3_FrameID id = ID3Frame_GetID(frame); ID3Field* field; if ((field = ID3Frame_GetField(frame, ID3FN_TEXT)) != NULL) { size_t chrs; char text[100]; unicode_t utext[100]; /* * The functions that would tell us if we had Unicode (actually UCS-2) * are not exported to C by libid3. However, if this fails, we assume * ASCII's worth a try. This appears to be the same thing as testing * for ID3 v2.3, but, again no exported function. We could read the file. * See: http://www.id3.org/id3v2.3.0 */ if ( ( chrs = ID3Field_GetUNICODE(field, utext, 100) ) != 0 ) { to_ascii(utext, chrs, text); } else { if ( ID3Field_GetASCII(field, text, 100) == 0 ) return; } printf("%d %s\n", id, text); } } }
I have a solution. Unfortunately it requires patches to id3lib, and as that code is not being developed, I expect that nothing will happen to my changes. For the adventurous, a new version of MP3-ID3LLib and patches to id3lib are available. Just drop me a note. Here's how it works. UNICODE (UCS-2) strings are detected when the MP3 file is loaded and converted locally to ASCII (UTF-8, without any multi-byte characters) The contents of the fields are visible as normal Perl strings, and the fact that they were UNICODE is remembered. If your UNICODE goes beyond 8-bit characters, or uses multi-byte characters, you're out of luck. But I guess that you know that already. When changes are committed, the strings are converted back to UNICODE. If you wish to keep the ASCII encoding, you call set_ascii() instead of set(). Changes to id3lib are three-fold. Minor needed-to-compile changes for gcc 4.3, additional export-to-C functions needed for the additions to ID3Lib.xs and a fix for (what I believe to be) a bug in id3lib. The alleged bug is as follows. ID3 has a bit that says whether the text in a frame is ASCII or UNICODE. Even when you use id3lib's ID3Field_SetASCII function, that bit overrides changes to the text when the changes are committed. I changed things so that ID3Field_SetEncoding overrides the old behaviour.