Bug #93140 for Cache-Memcached-Fast: Problem Getting "Big" Values

Tue Feb 18 11:59:35 2014 damien.chaumette [...] gmail.com - Ticket created

Subject:	Problem Getting "Big" Values
Date:	Tue, 18 Feb 2014 16:59:26 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

Hi there, I have been playing around with a few Memcached client libraries recently and, ideally, would like to stick with Perl, but with decent performance. *Memcached::Client* is simple enough to use but performance isn't close Python or C# libraries offer. Thinking it was probably due to its pure Perl nature, I then came across your library. During the CPAN installation I ran into the following unit test issue: "*Failed test 'Fetch# at t/big_value.t line 40."* With the Memcached server I am running, it's possible to up the default value-size limit from the default 1MB to 128MB. Doing so and setting the "max_size" parameter in your library I am able to *set* very large values without any problem. The problem comes when I try to *get* them out, it just doesn't return anything. The key'/value most definitely has been set as I can retrieve it with a get call from *Memcached::Client*. I've placed a code sample here, rather simple but it might help: http://pastebin.com/QtBxqMhB My setup: - Windows 2012 - Strawberry Perl 5.16 64bits - Cache::Memcached::Fast v0.21 - Memcached v1.4.5 I've had no problem using my Memcached server build with: - python-memcached - EnyimMemcached (C#) - *Memcache::Client* but it's worth nothing that *Cache::Memcached* throws up quite a bit of unit tests failures and therefore hasn't been tested. Let me know if this makes any sense and/or if you need more information. Thanks, Damien

Tue Feb 18 13:00:46 2014 KROKI [...] cpan.org - Correspondence added

On Tue Feb 18 11:59:35 2014, damien.chaumette@gmail.com wrote: Show quoted text

> During the CPAN installation I ran into the following unit test issue: > "*Failed > test 'Fetch# at t/big_value.t line 40."*

This test has been written before memcached acquired -I option to raise 1MB limit. The test assumes that values slightly below 1MB will be accepted by the server, but slightly above will be rejected. My guess you run the tests on memcached -I 128m and value >1MB wasn't rejected by the server, hence the test failure. Show quoted text

> With the Memcached server I am running, it's possible to up the > default > value-size limit from the default 1MB to 128MB. Doing so and setting > the > "max_size" parameter in your library I am able to *set* very large > values > without any problem. > The problem comes when I try to *get* them out, it just doesn't return > anything.

Can't reproduce. I ran memcached -I 128m -vv (version 1.4.17), adjusted your code to use 100000 * 1024 value, everything worked on Linux (I don't have access to any Windows host so can't test there): $ perl /tmp/a.pl Setting keys with Fast.. done. Getting keys with Fast.. data_short=1024 data_long=102400000 Getting keys with Client.. data_short=1024 data_long=102400000 Output from memcached: ... slab class 64: chunk size 134217728 perslab 1 ... <30 new auto-negotiating client connection 30: Client using the ascii protocol <30 set data_short 0 0 1024 Show quoted text

>30 STORED

<30 set data_long 0 0 102400000 Show quoted text

>30 STORED

<30 get data_short Show quoted text

>30 sending key data_short >30 END

<30 get data_long Show quoted text

>30 sending key data_long >30 END

<31 new auto-negotiating client connection 31: Client using the ascii protocol <31 get data_short Show quoted text

>31 sending key data_short >31 END

<31 get data_long Show quoted text

>31 sending key data_long >31 END

<30 connection closed. <31 connection closed. Could you please provide the output of memcached -vv when C::M::F fails?

Tue Feb 18 13:00:46 2014 The RT System itself - Status changed from 'new' to 'open'

Wed Feb 19 04:50:58 2014 damien.chaumette [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93140] Problem Getting "Big" Values
Date:	Wed, 19 Feb 2014 09:50:46 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

On 18 February 2014 18:00, Tomash Brechko via RT < bug-Cache-Memcached-Fast@rt.cpan.org> wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=93140 > > > On Tue Feb 18 11:59:35 2014, damien.chaumette@gmail.com wrote:

> > During the CPAN installation I ran into the following unit test issue: > > "*Failed > > test 'Fetch# at t/big_value.t line 40."*

> > This test has been written before memcached acquired -I option to raise > 1MB limit. The test assumes that values slightly below 1MB will be > accepted by the server, but slightly above will be rejected. My guess you > run the tests on memcached -I 128m and value >1MB wasn't rejected by the > server, hence the test failure.

As per the big_value.t file it seems that line 40 isn't about setting something larger than 1MB (as this is on line 43) but about getting the regular test value ($value = 'x' x THRESHOLD). Which seems rather identical to the problem I experience. Show quoted text

> > With the Memcached server I am running, it's possible to up the > > default > > value-size limit from the default 1MB to 128MB. Doing so and setting > > the > > "max_size" parameter in your library I am able to *set* very large > > values > > without any problem. > > The problem comes when I try to *get* them out, it just doesn't return > > anything.

> > Can't reproduce. I ran memcached -I 128m -vv (version 1.4.17), adjusted > your code to use 100000 * 1024 value, everything worked on Linux (I don't > have access to any Windows host so can't test there): > > $ perl /tmp/a.pl > Setting keys with Fast.. > done. > > Getting keys with Fast.. > data_short=1024 > data_long=102400000 > > Getting keys with Client.. > data_short=1024 > data_long=102400000 > > Output from memcached: > ... > slab class 64: chunk size 134217728 perslab 1 > ... > <30 new auto-negotiating client connection > 30: Client using the ascii protocol > <30 set data_short 0 0 1024

> >30 STORED

> <30 set data_long 0 0 102400000

> >30 STORED

> <30 get data_short

> >30 sending key data_short > >30 END

> <30 get data_long

> >30 sending key data_long > >30 END

> <31 new auto-negotiating client connection > 31: Client using the ascii protocol > <31 get data_short

> >31 sending key data_short > >31 END

> <31 get data_long

> >31 sending key data_long > >31 END

> <30 connection closed. > <31 connection closed. > > > Could you please provide the output of memcached -vv when C::M::F fails? > >

Sure, please find below the output: ... slab class 64: chunk size 134217728 perslab 1 ... <604 new auto-negotiating client connection 604: Client using the ascii protocol <604 set data_short 0 0 1024 Show quoted text

>604 STORED

<604 set data_long 0 0 10240000 Show quoted text

>604 STORED

<604 get data_short Show quoted text

>604 sending key data_short >604 END

<604 get data_long Show quoted text

>604 sending key data_long >604 END

<612 new auto-negotiating client connection 612: Client using the ascii protocol <612 get data_short Show quoted text

>612 sending key data_short >612 END

<612 get data_long Show quoted text

>612 sending key data_long >612 END

<612 connection closed. Show quoted text

>608 END

<608 connection closed. Nothing seems out of the ordinary, except that the C::M::F get for "data_long" didn't actually return anything. Thanks, Damien

Wed Feb 19 07:05:41 2014 KROKI [...] cpan.org - Correspondence added

On Wed Feb 19 04:50:58 2014, damien.chaumette@gmail.com wrote: Show quoted text

> As per the big_value.t file it seems that line 40 isn't about setting > something larger than 1MB (as this is on line 43) but about getting the > regular test value ($value = 'x' x THRESHOLD). Which seems rather identical > to the problem I experience.

Yup, you are right, I didn't check with the code. My final wild guess is that your server is too busy and memcached doesn't respond quite fast. Please set io_timeout parameter to zero to disable I/O timeout (default is 1 second), i.e. my $memd_fast = Cache::Memcached::Fast->new ({ servers => [ 'localhost:11211' ], max_size => 10000 * 1024, io_timeout => 0, }); If this won't solve the problem then it would be interesting to know the minimum item size that reveals the problem and also if this size is always the same or varies slightly. However I don't have any further ideas.

Wed Feb 19 07:11:29 2014 KROKI [...] cpan.org - Correspondence added

On Wed Feb 19 07:05:41 2014, KROKI wrote: Show quoted text

> My final wild guess > is that your server is too busy and memcached doesn't respond quite > fast. Please set io_timeout parameter to zero to disable I/O timeout

Actually, by default close_on_error is enabled, so if it was a timeout issue the connetion would be closed. However from your memcached output I don't see close event for 604 at all, which is puzzling.

Wed Feb 19 07:31:05 2014 damien.chaumette [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93140] Problem Getting "Big" Values
Date:	Wed, 19 Feb 2014 12:30:53 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

On 19 February 2014 12:11, Tomash Brechko via RT < bug-Cache-Memcached-Fast@rt.cpan.org> wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=93140 > > > On Wed Feb 19 07:05:41 2014, KROKI wrote:

> > My final wild guess > > is that your server is too busy and memcached doesn't respond quite > > fast. Please set io_timeout parameter to zero to disable I/O timeout

> > Actually, by default close_on_error is enabled, so if it was a timeout > issue the connetion would be closed. However from your memcached output I > don't see close event for 604 at all, which is puzzling. >

The server has beefy specs and is otherwise idling, and the same behaviour was witnessed when client and server was run on my local machine which again is beefy and not overloaded at all. I've re-run the same script but with only the data_short set and gets and the C::M::F get connection does close but later in time: <604 new auto-negotiating client connection 604: Client using the ascii protocol <604 set data_short 0 0 1024 Show quoted text

>604 STORED

<604 get data_short Show quoted text

>604 sending key data_short >604 END

<612 new auto-negotiating client connection 612: Client using the ascii protocol <612 get data_short Show quoted text

>612 sending key data_short >612 END

<612 connection closed. <*604 connection closed*. Running the Memcached server in `-vvv` I can see rows and rows of: 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting 604: going from conn_waiting to conn_read 604: going from conn_read to conn_waiting Which probably explains why the data isn't returned and C::M::F connection isn't closed in a timely fashion? Thanks, Damien

Wed Feb 19 08:35:18 2014 KROKI [...] cpan.org - Correspondence added

On Wed Feb 19 07:31:05 2014, damien.chaumette@gmail.com wrote: Show quoted text

> I've re-run the same script but with only the data_short set and gets and > the C::M::F get connection does close but later in time: > ... > <612 connection closed. > <*604 connection closed*.

I wonder how later the close of 604 happens, simply being after 612 is OK, any delay makes me think of io_timeout. Show quoted text

> 604: going from conn_waiting to conn_read > 604: going from conn_read to conn_waiting > > Which probably explains why the data isn't returned and C::M::F connection > isn't closed in a timely fashion?

Those lines mean memcached server is reading (or trying to read) from the client. If they are output during the STORE then this is expected, if after GET then this is puzzling. Please do one more try before giving up: set io_timeout to zero, do gets from Memcached::Client before gets from C::M::F, and with C::M::F do get(data_long) before get(data_short), and finally paste output of memcached -vv here. Upgrading memcached to the latest versions may also help (yes, I know that only C::M::F triggers the problem, but still ;)).

Wed Feb 19 09:14:02 2014 damien.chaumette [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93140] Problem Getting "Big" Values
Date:	Wed, 19 Feb 2014 14:13:52 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

On 19 February 2014 13:35, Tomash Brechko via RT < bug-Cache-Memcached-Fast@rt.cpan.org> wrote: Show quoted text

> <URL: https://rt.cpan.org/Ticket/Display.html?id=93140 > > > On Wed Feb 19 07:31:05 2014, damien.chaumette@gmail.com wrote:

> > 604: going from conn_waiting to conn_read > > 604: going from conn_read to conn_waiting > > > > Which probably explains why the data isn't returned and C::M::F

> connection

> > isn't closed in a timely fashion?

> > Those lines mean memcached server is reading (or trying to read) from the > client. If they are output during the STORE then this is expected, if > after GET then this is puzzling. >

They are definitely happening at the GET stage, not at the STORE. On the data_long C::M::F run it'll go on for ever, I let it go for a few minutes and it didn't stop. To be fair if it is trying to communicate with the client it's much too late at this point as we have exited a long time ago. Show quoted text

> Please do one more try before giving up: set io_timeout to zero, do gets > from Memcached::Client before gets from C::M::F, and with C::M::F do > get(data_long) before get(data_short), and finally paste output of > memcached -vv here. Upgrading memcached to the latest versions may also > help (yes, I know that only C::M::F triggers the problem, but still ;)). >

I've done as requested (first two gets are C::M::F), behaviour is the same as per below: <608 new auto-negotiating client connection 608: Client using the ascii protocol <608 set data_short 0 0 1024 Show quoted text

>608 STORED

<608 set data_long 0 0 10240000 Show quoted text

>608 STORED

<616 new auto-negotiating client connection 616: Client using the ascii protocol <616 get data_long Show quoted text

>616 sending key data_long >616 END

<616 get data_short Show quoted text

>616 sending key data_short >616 END

<608 get data_long Show quoted text

>608 sending key data_long >608 END

<620 new auto-negotiating client connection 620: Client using the ascii protocol <620 get data_short Show quoted text

>620 sending key data_short >620 END

<616 connection closed. <620 connection closed. There doesn't seem to be official Windows builds for Memcached any more, and it's difficult to justify spending too much time in trying to build from source when there doesn't seem to be a known issue with this revision and other clients aren't having the same problem. I understand if you can't devote more time to find this Windows specific problem, but could you perhaps point me to the places in the C code I should look first should I want to troubleshoot myself? Thanks, Damien

Wed Feb 19 09:19:43 2014 damien.chaumette [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93140] Problem Getting "Big" Values
Date:	Wed, 19 Feb 2014 14:19:34 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

On 19 February 2014 14:13, Damien Chaumette <damien.chaumette@gmail.com>wrote: Show quoted text

> > I've done as requested (first two gets are C::M::F)[..] > >

Sorry I meant the first two gets are with M::C and the second batch with C::M::F, as requested. Thanks, Damien

Wed Feb 19 10:32:28 2014 damien.chaumette [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93140] Problem Getting "Big" Values
Date:	Wed, 19 Feb 2014 15:32:16 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

Apologies for the spam, I've tried with a Couchbase Memcached bucket (they seem to provide up to date Windows Memcached) and get the same behaviour. Thanks, Damien On 19 February 2014 14:19, Damien Chaumette <damien.chaumette@gmail.com>wrote: Show quoted text

> > > > On 19 February 2014 14:13, Damien Chaumette <damien.chaumette@gmail.com>wrote: >

>> >> I've done as requested (first two gets are C::M::F)[..] >> >>

> Sorry I meant the first two gets are with M::C and the second batch with > C::M::F, as requested. > > Thanks, > Damien >

Wed Feb 19 11:17:14 2014 KROKI [...] cpan.org - Correspondence added

On Wed Feb 19 09:14:02 2014, damien.chaumette@gmail.com wrote: Show quoted text

> They are definitely happening at the GET stage, not at the STORE. > On the data_long C::M::F run it'll go on for ever, I let it go for a few > minutes and it didn't stop. > To be fair if it is trying to communicate with the client it's much too > late at this point as we have exited a long time ago.

C::M::F client sends get request, reads the reply and sends nothing else after that. If I got you correctly then memcached server is trying to read something from the client, and I don't understand why. Show quoted text

> I've done as requested (first two gets are C::M::F), behaviour is the same > as per below:

What I see from the trace: Show quoted text

> <608 new auto-negotiating client connection > 608: Client using the ascii protocol > <608 set data_short 0 0 1024

> >608 STORED

> <608 set data_long 0 0 10240000

> >608 STORED

C::M::F stored two keys in memcached. Show quoted text

> <616 new auto-negotiating client connection > 616: Client using the ascii protocol > <616 get data_long

> >616 sending key data_long > >616 END

> <616 get data_short

> >616 sending key data_short > >616 END

Other client got two keys. Show quoted text

> <608 get data_long

> >608 sending key data_long > >608 END

C::M::F requested data_long and got some error at this point, closed the connection, and requested data_short via another connection: Show quoted text

> <620 new auto-negotiating client connection > 620: Client using the ascii protocol > <620 get data_short

> >620 sending key data_short > >620 END

> <616 connection closed. > <620 connection closed.

Given that you disabled io_timeout I can only guess what error could happen, but C::M::F definitely got one, otherwise it wouldn't open another connection. So the question is why C::M::F thinks it got an error, and whether its perception is valid. Show quoted text

> There doesn't seem to be official Windows builds for Memcached any more, > and it's difficult to justify spending too much time in trying to build > from source when there doesn't seem to be a known issue with this revision > and other clients aren't having the same problem.

There's a tiny possibility that C::M::F specifically triggers some race in memcached server that other clients don't, and race fixes happen almost every memcached release (without detailed description what they could affect). But I got your last reply, the problem is reproduceable with the latest memcached. Show quoted text

> I understand if you can't devote more time to find this Windows specific > problem, but could you perhaps point me to the places in the C code I > should look first should I want to troubleshoot myself?

First of all an easier way would be to use another module like Cache::Memcached::libmemcached (provided that it builds on Windows and works) - should be comparably fast. Debugging Perl modules in C is a pain even on Linux, and simply looking into the code won't reveal much I think because the problem somehow relates to your setup (I built memcached 1.4.5 here but couldn't reproduce). Value reading happens in src/client.c:read_value(). But if you really want to devote time to it, the first thing I would try is to capture network traffic to see actual packet contents (data_long is big, but protocol commands will always be at packet beginnings), and to trace system calls (on Linux we have strace utility that shows system calls and their results; don't know what Windows has).

Thu Feb 20 04:26:47 2014 damien.chaumette [...] gmail.com - Correspondence added

Subject:	Re: [rt.cpan.org #93140] Problem Getting "Big" Values
Date:	Thu, 20 Feb 2014 09:26:37 +0000
To:	bug-Cache-Memcached-Fast [...] rt.cpan.org
From:	Damien Chaumette <damien.chaumette [...] gmail.com>

Show quoted text

> > I understand if you can't devote more time to find this Windows specific > > problem, but could you perhaps point me to the places in the C code I > > should look first should I want to troubleshoot myself?

> > First of all an easier way would be to use another module like > Cache::Memcached::libmemcached (provided that it builds on Windows and > works) - should be comparably fast. Debugging Perl modules in C is a pain > even on Linux, and simply looking into the code won't reveal much I think > because the problem somehow relates to your setup (I built memcached 1.4.5 > here but couldn't reproduce). Value reading happens in > src/client.c:read_value(). But if you really want to devote time to it, > the first thing I would try is to capture network traffic to see actual > packet contents (data_long is big, but protocol commands will always be at > packet beginnings), and to trace system calls (on Linux we have strace > utility that shows system calls and their results; don't know what Windows > has). >

I've just had a look at Cache::Memcached::libmemcached on CPAN although sadly it doesn't compile out of the box on my Strawberry Perl.. There also seems to be a few Couchbase Perl module on CPAN as well, with a bit of luck one of them will be in C and compile on Windows. I'll try to get C::M::libmemcached to work first, if not then a fast Couchbase module, failing that I may do some network sniffing around and/or look into Windows system calls. Thanks, Damien