On Fri, Apr 04, 2014 at 07:18:15AM -0400, Victor Efimov via RT <bug-DBD-mysql@rt.cpan.org> wrote:
Show quoted text> Yes, I meant character strings (unicode strings). I told that it would break existing code, and this is correct.
It's not, and nothign you say indicates otherwise.
Show quoted text> We have such code now, it works fine
It works fine by accident only. It might not work with older or newer perl
versions, because it relies on undocumented behaviour inside the perl
interpreter which can and does change in different versions.
Show quoted text> because downgraded unicode strings are rare
They are rare because you are lucky - but what happens when you hit that
rare case? Does your code that works fine still work fine in these rare
cases?
Show quoted text> and because we use it for Russian text (which cannot be downgraded).
Russian text can easily be downgraded, for example when it's encoded in
utf-8, as required by mysql.
Show quoted text> So I would consider it broken in rare cases.
The key is that the code in question already is broken, even if you are
lucky and it works except in rare cases.
Show quoted text> But you proposal will break it in _all_ cases.
Not sure, but possible. The key, again, is that the change would allow one to
fix broken code such as yours. Right now, the best you cna achieve is code
that happens to work "most of the time".
So your proposal is to keep a bug that makes it impossible to write corretc
and working code, because it makes already broken code fail
deterministically.
I would say that's a ridiculous proposal. Why would anybody want
guaranteed brokenness?
Even you admit that your code already *is* broken.
And so is my own code. And there is no way to fix either until DBD::mysql
is fixed. I can try various workarounds such as utf8::downgrade or
upgrade, but that doesn't fix the code, it only makes it work with my
current perl binary.
Show quoted text> > No matter how you turn it, DBD::mysql is simply broken w.r.t. perl
> > strings, because it doesn't let the user chose the format.
>
> I agree - it's broken on API level. It should have different API where users can specify where is binary string and where is character string.
Either that, or it should simply offer the same API as mysql, namely use
the same encodign as the underlying c lib, just as basically any other
library does on the planet (compare Compress::Zlib for example, which
doesn't have this bug, and also doesn't require extra specificatrion of
whether something is a text string or not).
I think whoever implemented this utf-8 stuff in DBD::mysql was simply
confused - utf-8 strings aren't unicode strings.
Fortunately, this is not a situation that created a backwards
compatibility problem, because the behaviour isn't deterministic, but
effectively random.
Show quoted text> > binary data (unless DBD::mysql is even more buggy), as utf8 is binary
>
> Yes, right. I think you misunderstands me - actually I meant that you
> _could_ suggest a solution that people should not use unicode character
> strings without mysql_enable_utf8=1 (and this will make you proposal for
> downgrading strings valid when mysql_enable_utf8=0), and I explained why
> this would not help either - that's because even in mysql_enable_utf8=1
> mode there will be binary data for binary columns that should not be
> upgraded.
The documentation of mysql_enable_utf8 says "turning on this flag tells MySQL
that incoming data should be treated as UTF-8".
I don't know what the option does (apparently, it doesn't treat anything as
utf-8 with this flag, right?), but as documented, yes, it's quite obvious
that you can't pass in generic binary data anymore.
(In fact, I suspect when you pass in utf-8 data as expected, it will be
double-encoded, which would intorduce pretty obvious data corruption).
Of course, this option is marked as experimental (in my copy at least), so
one shouldn't be surprised if a bug is found and fixed.
In any case, I don't see what mysql_enable_utf8 has to do with anything, it's
clearly a useless option unless all your data is unicode (or utf-8?), and
even has the potential to corrupt data even more (what happens when i pass
data to a binary column and retrieve it, will it double or even
triple-encoding the data in some cases? As the documentatino stands, it seems
that is the case).
Show quoted text> > know which part is unclear, let me assure you I will be happy to
> > explain how
> > the perl string model works, how utf-8 works and so on, but I need
> > some clues
>
> No, thank you, I think I already know how it works.
It looks to me as if you keep confusing unicode and utf-8 strings. They are
different in Perl.
Show quoted text> Also FYI I am not maintainer of this module.
I know, but the maintainer of this module could be confused by your wrong
comments, so it's good to clear up the situation.
Summary: your code is broken, and so is mine. You might not understand it
yet, but you are suffering from this very bug, just in reverse. If this bug
were fixed, we both could fix our code.
--
The choice of a Deliantra, the free code+content MORPG
-----==- _GNU_
http://www.deliantra.net
----==-- _ generation
---==---(_)__ __ ____ __ Marc Lehmann
--==---/ / _ \/ // /\ \/ / schmorp@schmorp.de
-=====/_/_//_/\_,_/ /_/\_\