Skip Menu |

This queue is for tickets about the RT-Extension-LDAPImport CPAN distribution.

Report information
The Basics
Id: 74144
Status: open
Priority: 0/
Queue: RT-Extension-LDAPImport

People
Owner: Nobody in particular
Requestors: BHEISIG [...] cpan.org
Cc:
AdminCc:

Bug Information
Severity: Important
Broken in: 0.31
Fixed in: (no value)



Hello everyone, I've found an encoding problem inside this extension. Whenever it receives user information from database, special characters like German umlauts are not UTF-8 encoded. This causes that each field which contains special characters will be updated. After the update there is no encoding problem because the data from LDAP is encoded in UTF-8, but there is a huge database pollution in the transaction table. My LDAP synchronization includes ~ 2000 users, most of them are located in "Düsseldorf" (field "City"), cron job runs every 15 minutes---imagine how many inserts have been made in the last few weeks ;-) Here is what /opt/rt4/local/plugins/RT-Extension-LDAPImport/bin/rtldapimport --debug wrote: City D�sseldorf => Düsseldorf I couldn't resolv the problem within the extension. So I patched /opt/rt4/lib/RT/Record.pm (see my attachment). This works, but looks like an ugly workaround. More information about my system: * RT 4.0.2 * Perl 5.10.1 * Linux debian 2.6.32-5-686 #1 SMP Fri Sep 9 20:51:05 UTC 2011 i686 GNU/Linux Thank you very much for you help! Benjamin Heisig
Subject: Record.pm.patch
--- a/Record.pm 2012-01-19 09:53:56.802582265 +0100 +++ b/Record.pm 2012-01-19 10:00:38.203170994 +0100 @@ -887,10 +887,14 @@ my $name = $self->$object->Name; next if $name eq $value || $name eq ($value || 0); }; - next if $value eq $self->$attribute(); - next if ($value || 0) eq $self->$attribute(); - }; + my $existing = $self->$attribute(); + my $current = Encode::encode("utf8", $existing); + + next if Encode::encode("utf8", $value) eq $current; + next if ($value || 0) eq $current; + }; + $new_values{$attribute} = $value; }
Subject: Re: [rt.cpan.org #74144]
Date: Thu, 19 Jan 2012 13:56:34 -0500
To: bug-RT-Extension-LDAPImport [...] rt.cpan.org
From: Thomas Sibley <tsibley [...] cpan.org>
On 01/19/2012 04:34 AM, Benjamin Heisig via RT wrote: Show quoted text
> I've found an encoding problem inside this extension. Whenever it > receives user information from database, special characters like German > umlauts are not UTF-8 encoded. This causes that each field which > contains special characters will be updated. > > After the update there is no encoding problem because the data from LDAP > is encoded in UTF-8, but there is a huge database pollution in the > transaction table. My LDAP synchronization includes ~ 2000 users, most > of them are located in "Düsseldorf" (field "City"), cron job runs every > 15 minutes---imagine how many inserts have been made in the last few > weeks ;-) > > Here is what > /opt/rt4/local/plugins/RT-Extension-LDAPImport/bin/rtldapimport --debug > wrote: > > City D�sseldorf => Düsseldorf
It is very strange to me that RT's normal API would be returning values which aren't utf8. If that was the case, I wouldn't expect unicode to work at all in RT (yet it does just fine). I suspect the actual issue is with the value provided by Net::LDAP. Can you use Devel::Peek on the two values printed by rtldapimport --debug and send the output? Thomas
Thanks for your reply, Thomas! Show quoted text
> It is very strange to me that RT's normal API would be returning values > which aren't utf8. If that was the case, I wouldn't expect unicode to > work at all in RT (yet it does just fine). I suspect the actual issue > is with the value provided by Net::LDAP.
Yes, it *is* strange... I could reproduce it on two different systems (Debian and SLES) with two different LDAP servers (AD). Show quoted text
> Can you use Devel::Peek on the two values printed by rtldapimport > --debug and send the output?
Here is the output for a user's "Address1": Address1 H�henstr. 87 => Höhenstr. 87 The first dump is from $old_value, the second from $user->{$key}: SV = PV(0xaae0b80) at 0x9d8f710 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0xb6328f0 "H\303\266henstr. 87"\0 [UTF8 "H\x{f6}henstr. 87"] CUR = 13 LEN = 16 SV = PV(0xaae4670) at 0xb664fb8 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0xb6342d0 "H\303\266henstr. 87"\0 CUR = 13 LEN = 16 I hope it helps! Benjamin
Subject: Re: [rt.cpan.org #74144] Bad encoded data from database
Date: Sun, 22 Jan 2012 15:23:50 +0400
To: bug-RT-Extension-LDAPImport [...] rt.cpan.org
From: Ruslan Zakirov <ruz [...] bestpractical.com>
Hi, I think from LDAP we get octets, but in RT we should store strings and get strings from API calls. To get correct comparison we should upgrade data we get from LDAP to perl strings. Google suggests that LDAPv3 sends UTF-8, v2 uses some specific format. We can enforce v3 on anything that has not ASCII data. On Fri, Jan 20, 2012 at 20:17, Benjamin Heisig via RT <bug-RT-Extension-LDAPImport@rt.cpan.org> wrote: Show quoted text
>       Queue: RT-Extension-LDAPImport >  Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=74144 > > > Thanks for your reply, Thomas! >
>> It is very strange to me that RT's normal API would be returning values >> which aren't utf8.  If that was the case, I wouldn't expect unicode to >> work at all in RT (yet it does just fine).  I suspect the actual issue >> is with the value provided by Net::LDAP.
> > Yes, it *is* strange... I could reproduce it on two different systems > (Debian and SLES) with two different LDAP servers (AD). >
>> Can you use Devel::Peek on the two values printed by rtldapimport >> --debug and send the output?
> > Here is the output for a user's "Address1": > >        Address1        H�henstr. 87 => Höhenstr. 87 > > The first dump is from $old_value, the second from $user->{$key}: > > SV = PV(0xaae0b80) at 0x9d8f710 >  REFCNT = 1 >  FLAGS = (PADMY,POK,pPOK,UTF8) >  PV = 0xb6328f0 "H\303\266henstr. 87"\0 [UTF8 "H\x{f6}henstr. 87"] >  CUR = 13 >  LEN = 16 > SV = PV(0xaae4670) at 0xb664fb8 >  REFCNT = 1 >  FLAGS = (POK,pPOK) >  PV = 0xb6342d0 "H\303\266henstr. 87"\0 >  CUR = 13 >  LEN = 16 > > I hope it helps! > >    Benjamin > >
-- Best regards, Ruslan.
CC: bug-RT-Authen-ExternalAuth [...] rt.cpan.org
Subject: Re: [rt.cpan.org #74144] Bad encoded data from database
Date: Mon, 23 Jan 2012 12:04:31 -0500
To: bug-RT-Extension-LDAPImport [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
On 01/22/2012 06:24 AM, Ruslan Zakirov via RT wrote: Show quoted text
> I think from LDAP we get octets, but in RT we should store strings and > get strings from API calls. To get correct comparison we should > upgrade data we get from LDAP to perl strings. Google suggests that > LDAPv3 sends UTF-8, v2 uses some specific format. We can enforce v3 on > anything that has not ASCII data.
This concurs with what I expected was happening. It's not RT that needs encoding, it's LDAP that needs decoding. I do wonder if this problem is present in RT-Authen-ExternalAuth as well. Thomas
CC: bug-RT-Authen-ExternalAuth [...] rt.cpan.org
Subject: Re: [rt.cpan.org #74144] Bad encoded data from database
Date: Mon, 23 Jan 2012 12:04:31 -0500
To: bug-RT-Extension-LDAPImport [...] rt.cpan.org
From: Thomas Sibley <trs [...] bestpractical.com>
On 01/22/2012 06:24 AM, Ruslan Zakirov via RT wrote: Show quoted text
> I think from LDAP we get octets, but in RT we should store strings and > get strings from API calls. To get correct comparison we should > upgrade data we get from LDAP to perl strings. Google suggests that > LDAPv3 sends UTF-8, v2 uses some specific format. We can enforce v3 on > anything that has not ASCII data.
This concurs with what I expected was happening. It's not RT that needs encoding, it's LDAP that needs decoding. I do wonder if this problem is present in RT-Authen-ExternalAuth as well. Thomas
Am Mo 23. Jan 2012, 12:04:32, trs@bestpractical.com schrieb: Show quoted text
> This concurs with what I expected was happening. It's not RT that needs > encoding, it's LDAP that needs decoding.
Do you have any idea how to fix this? If there is something I can do please let me know. (I'm currently working on a little feature to privilege and/or disable users (and vice versa) when syncing.) Benjamin
Subject: Re: [rt.cpan.org #74144] Bad encoded data from database
Date: Wed, 08 Feb 2012 09:13:49 -0500
To: bug-RT-Extension-LDAPImport [...] rt.cpan.org
From: Thomas Sibley <tsibley [...] cpan.org>
On 02/08/2012 04:29 AM, Benjamin Heisig via RT wrote: Show quoted text
> Am Mo 23. Jan 2012, 12:04:32, trs@bestpractical.com schrieb:
>> This concurs with what I expected was happening. It's not RT that needs >> encoding, it's LDAP that needs decoding.
> > Do you have any idea how to fix this? If there is something I can do > please let me know. (I'm currently working on a little feature to > privilege and/or disable users (and vice versa) when syncing.)
Yes, the fix is to decode the LDAP data we receive (i.e. upgrading it to Perl strings). Ruslan noted that LDAPv3 seems to guarantee UTF-8 octets, which means we can run decode_utf8() on it (see perldoc Encode). If you want to write a patch for this bug, please first start by writing tests that trigger the failure case. Cheers, Thomas
Subject: Re: [rt.cpan.org #74144] Bad encoded data from database
Date: Tue, 26 Feb 2013 08:12:17 +0800
To: bug-RT-Extension-LDAPImport [...] rt.cpan.org
From: Craig Ringer <craig [...] 2ndquadrant.com>
I'm working on a test case to demonstrate this bug now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services