Skip Menu |

This queue is for tickets about the MP3-Tag CPAN distribution.

Report information
The Basics
Id: 62951
Status: open
Priority: 0/
Queue: MP3-Tag

People
Owner: Nobody in particular
Requestors: voropaev.andrey [...] googlemail.com
Cc:
AdminCc:

Bug Information
Severity: (no value)
Broken in: (no value)
Fixed in: (no value)



Subject: MP3::Tag::ID3v2->add_frame always calls Encode::encode on provided strings
Date: Fri, 12 Nov 2010 13:03:52 +0100
To: bug-MP3-Tag [...] rt.cpan.org
From: Andrey Voropaev <voropaev.andrey [...] googlemail.com>
I understand, that writing of v2.4 tags is "not supported", but there is code for this in the module and it does pretty good work for me. There's only one catch. The code assumes that the text string passed to the method add_frame has UTF8 flag set, so it calls Encode::encode to convert it into "octets". In reality, the string might be the "octets" already. So it makes sense to call Encode::is_utf8 before calling Encode::encode. See the attached patch for proposed solution. Actually, it looks like the Encode module is incorrectly used. The function "decode" converts into internal format assuming that the sequence of given octets corresponds to specified encoding. The function "encode" converts from internal format into sequence of octets using specified encoding. There's also function "from_to" which converts sequence of octets from one encoding into another encoding. So, the "from_to" function can be viewed as sub from_to { my $new_encoding = pop; my $old_encoding = pop; my $t = decode($old_encoding, $_[0]); $_[0] = encode($new_encoding, $t); } Best regards Andrei Voropaev

Message body is not shown because sender requested not to inline it.

CC: undisclosed-recipients: ;
Subject: Re: [rt.cpan.org #62951] MP3::Tag::ID3v2->add_frame always calls Encode::encode on provided strings
Date: Fri, 12 Nov 2010 16:04:33 -0800
To: Andrey Voropaev via RT <bug-MP3-Tag [...] rt.cpan.org>
From: Ilya Zakharevich <nospam-abuse [...] ilyaz.org>
On Fri, Nov 12, 2010 at 07:04:10AM -0500, Andrey Voropaev via RT wrote: Show quoted text
> I understand, that writing of v2.4 tags is "not supported", but there is code > for this in the module and it does pretty good work for me. There's only one > catch. The code assumes that the text string passed to the method add_frame > has UTF8 flag set, so it calls Encode::encode to convert it into "octets". In > reality, the string might be the "octets" already.
No. This is similar to saying that "the string may be gzipped already", so instead of compression, check for GZIP header". Show quoted text
> So it makes sense to call Encode::is_utf8 before calling > Encode::encode.
Encode::is_utf8 should never be used inside an API. add_frame() takes a Perl strings. If you want a Russian roulette behaviour, call is_utf8 *before* calling add_frame(), and supply the $encoding argument explicitly (as in the first example of add_frame()). Show quoted text
> Actually, it looks like the Encode module is incorrectly used.
Please give an example of input on which add_frame() gives a wrong result. Yours, Ilya
Subject: Re: [rt.cpan.org #62951] MP3::Tag::ID3v2->add_frame always calls Encode::encode on provided strings
Date: Sat, 13 Nov 2010 21:41:07 +0100
To: bug-MP3-Tag [...] rt.cpan.org
From: Andrey Voropaev <voropaev.andrey [...] googlemail.com>
On Sat, Nov 13, 2010 at 1:04 AM, Ilya Zakharevich via RT <bug-MP3-Tag@rt.cpan.org> wrote: Show quoted text
> > No.  This is similar to saying that "the string may be gzipped > already", so instead of compression, check for GZIP header". >
>> So it makes sense to call Encode::is_utf8 before calling >> Encode::encode.
Well. It is true that the line between "a bug" and "a feature" can be very vague. It is fine with me, if you just update the documentation so that it clearly states which type of string you expect as input to add_frame, as well as which type of string the get_frame returns. Show quoted text
> > Encode::is_utf8 should never be used inside an API.  add_frame() takes > a Perl strings.  If you want a Russian roulette behaviour, call > is_utf8 *before* calling add_frame(), and supply the $encoding > argument explicitly (as in the first example of add_frame()). >
>> Actually, it looks like the Encode module is incorrectly used.
> > Please give an example of input on which add_frame() gives a wrong result.
Easy. I have an mp3 file with TIT2 in CP1251 encoding. I use the following code to convert it to UTF8 my ($info, $name, @rest) = $set->get_frame($frame); Encode::from_to($info, 'cp1251', 'utf8'); $set->change_frame($frame, 3, $info); This results in the junk in the TIT2. The reason is simple. The get_frame function returns sequence of octects, the change_frame (and add_frame) expects perl string in the internal form. It is not documented anywhere. So I have to debug the MP3 module to see which input is really expected. As said above. I don't mind calling is_utf8 in between calls to MP3 methods, but this contradicts to the policy of perl when it does "what is expected of it". Instead I have to check what is returned and have to search the modules code to see what the module expects from me.
CC: undisclosed-recipients: ;
Subject: Re: [rt.cpan.org #62951] MP3::Tag::ID3v2->add_frame always calls Encode::encode on provided strings
Date: Sat, 13 Nov 2010 14:34:53 -0800
To: Andrey Voropaev via RT <bug-MP3-Tag [...] rt.cpan.org>
From: Ilya Zakharevich <nospam-abuse [...] ilyaz.org>
On Sat, Nov 13, 2010 at 03:41:18PM -0500, Andrey Voropaev via RT wrote: Show quoted text
> > No.  This is similar to saying that "the string may be gzipped > > already", so instead of compression, check for GZIP header". > >
> >> So it makes sense to call Encode::is_utf8 before calling > >> Encode::encode.
> > Well. It is true that the line between "a bug" and "a feature" can be > very vague. It is fine with me, if you just update > the documentation so that it clearly states which type of string you > expect as input to add_frame, as well as > which type of string the get_frame returns.
In Perl, there is exactly one kind of strings (modulo Perl bugs) - those consisting of characters. I do not see any need to document this. Show quoted text
> > Please give an example of input on which add_frame() gives a wrong result.
Show quoted text
> Easy. I have an mp3 file with TIT2 in CP1251 encoding.
In ID3, TIT2 cannot be in cp1251 encoding. Show quoted text
> I following code to convert it to UTF8 > > my ($info, $name, @rest) = $set->get_frame($frame); > Encode::from_to($info, 'cp1251', 'utf8'); > $set->change_frame($frame, 3, $info); > > This results in the junk in the TIT2. The reason is simple.
Yes: your code is not correct. For "broken ID3" (one with prohibited encodings), use corresponding configuration variables (or environment variables). Show quoted text
> As said above. I don't mind calling is_utf8 in between calls to MP3 > methods, but this contradicts to the policy of perl when > it does "what is expected of it".
No, Perl does no such (extremely silly) things. Yours, Ilya
Subject: Re: [rt.cpan.org #62951] MP3::Tag::ID3v2->add_frame always calls Encode::encode on provided strings
Date: Mon, 15 Nov 2010 11:04:17 +0100
To: bug-MP3-Tag [...] rt.cpan.org
From: Andrey Voropaev <voropaev.andrey [...] googlemail.com>
On Sat, Nov 13, 2010 at 11:35 PM, Ilya Zakharevich via RT <bug-MP3-Tag@rt.cpan.org> wrote: Show quoted text
> In ID3, TIT2 cannot be in cp1251 encoding.
Maybe it can't, but I have few hundred files that have it. And I didn't create those files, I'm just trying to fix them :) Show quoted text
> Yes: your code is not correct.
Oops. Actually I do my ($info, $name, @rest) = $set->get_frame('TIT2', 'raw'); substr($info, 0, 1) = ''"; Show quoted text
> For "broken ID3" (one with prohibited encodings), use corresponding > configuration variables (or environment variables).
Ok. Found this one. Then after all, things are documented. It's just not easy to put different pieces together. No more complaints. As a side question/thought. Wouldn't it be simpler to use the check for "is_utf8" on the input than teach the user which environment variables/configuration should be set to obtain the desired result? You said, that it is "russian roulett", but is it really? I'll try to explain. The documentation for the module could state that the input shall be either octects in desired encoding or in perl's internal encoding. The output is always the octects. Nothing else. No environment variables, no internal encoding/decoding, except for the case when user passes input in perl's internal encoding, but this case is simple (destination encoding is known - use it, not known - use Latin1). I mean, there's a tool for converting octets from one encoding to another, why to introduce lots of variables and code for doing this conversion inside of MP3::Tag? It does not prevent the developer from working with "octets" instead of "true perl string". At best, this stuff should be just a convenience thing, not "the only way to do it". Like with "binmode" for files. User can read the data directly as it is, or apply binmode and get the converted to "true perl string" data. So, why this approach is not used? What complications would it introduce?
CC: undisclosed-recipients: ;
Subject: Re: [rt.cpan.org #62951] MP3::Tag::ID3v2->add_frame always calls Encode::encode on provided strings
Date: Mon, 15 Nov 2010 12:12:55 -0800
To: Andrey Voropaev via RT <bug-MP3-Tag [...] rt.cpan.org>
From: Ilya Zakharevich <nospam-abuse [...] ilyaz.org>
On Mon, Nov 15, 2010 at 05:04:32AM -0500, Andrey Voropaev via RT wrote: Show quoted text
> The documentation for the module could state that the input shall be > either octects in desired encoding or in perl's internal encoding.
Mentioning encoding would not "be easier". Guessing (and getting answer wrong most of the time) would not "be easier". Show quoted text
> The output is always the octects.
Perl can do better than this - it has real strings. Show quoted text
> Nothing else. No environment variables,
One does not need environment variables with ID3. Things just work. Show quoted text
> So, why this approach is not used? What complications would it introduce?
If you want raw data, just get it. Yours, Ilya