Skip Menu |

Preferred bug tracker

Please visit the preferred bug tracker to report your issue.

This queue is for tickets about the Net-ZooKeeper CPAN distribution.

Report information
The Basics
Id: 72661
Status: resolved
Priority: 0/
Queue: Net-ZooKeeper

People
Owner: Nobody in particular
Requestors: aleksey.mashanov [...] gmail.com
Cc:
AdminCc:

Bug Information
Severity: Critical
Broken in: (no value)
Fixed in: 0.36



Subject: Memory corruptions and infinite locks
Sometimes process hangs on infinte pthread_mutex_lock() with stack backtrace: #0 0x0010b7f2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x002986e9 in __lll_lock_wait () from /lib/libpthread.so.0 #2 0x00293dad in _L_lock_981 () from /lib/libpthread.so.0 #3 0x00293ccb in pthread_mutex_lock () from /lib/libpthread.so.0 #4 0x00f927fb in _zk_release_watches (my_perl=0x90bf008, first_watch=0x9106cb8, final=0) at ZooKeeper.xs:280 #5 0x00f92852 in _zk_replace_watch (my_perl=0x90bf008, handle=0x90efc60, first_watch=0x90dd8a8, old_watch=0x9106d28, new_watch=0x9106c48) at ZooKeeper.xs:309 #6 0x00f953f1 in XS_Net__ZooKeeper_exists (my_perl=0x90bf008, cv=0x90dfdfc) at ZooKeeper.xs:1488 #7 0x00c4e51d in Perl_pp_entersub () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #8 0x00c4798f in Perl_runops_standard () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so #9 0x00bed20e in perl_run () from /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so The problem is in corrupted memory of mutex. Here is the fix: --- ZooKeeper.xs 2011-11-23 16:39:49.000000000 +0400 +++ ZooKeeper.xs 2011-11-23 16:39:45.000000000 +0400 @@ -249,12 +249,13 @@ static void _zk_release_watch(pTHX_ zk_watch_t *watch, int list) { if (list) { + zk_watch_t *prev = watch->prev; if (watch->prev) { watch->prev->next = watch->next; watch->prev = NULL; } if (watch->next) { - watch->next->prev = watch->prev; + watch->next->prev = prev; watch->next = NULL; } }
Some more problems found. 1) Incorrect link in dual-list. watch->next->prev always contains NULL value. 2) Watch can be triggered more then once with event_type == ZOO_SESSION_EVENT. We must not destroy watch until it triggered with event_type != ZOO_SEESION_EVENT. It described in "ZooKeeper Programmer's Guide": "When you disconnect from a server (for example, when the server fails), you will not get any watches until the connection is reestablished. For this reason session events are sent to all outstanding watch handlers. Use session events to go into a safe mode: you will not be receiving events while disconnected, so your process should act conservatively in that mode." 3) Incorrect use of sv_usepvn. perlapi says about this function: "This function will realloc the memory pointed to by "ptr", so that pointer should not be freed or used by the programmer after giving it to sv_usepvn." --- ZooKeeper.xs 2009-04-20 20:44:47.000000000 +0400 +++ ZooKeeper.xs 2011-11-24 19:43:32.000000000 +0400 @@ -249,12 +249,13 @@ static void _zk_release_watch(pTHX_ zk_watch_t *watch, int list) { if (list) { + zk_watch_t *prev = watch->prev; if (watch->prev) { watch->prev->next = watch->next; watch->prev = NULL; } if (watch->next) { - watch->next->prev = watch->prev; + watch->next->prev = prev; watch->next = NULL; } } @@ -278,7 +279,7 @@ if (!final) { pthread_mutex_lock(&watch->mutex); - done = watch->done; + done = watch->done && watch->event_type != ZOO_SESSION_EVENT; pthread_mutex_unlock(&watch->mutex); } @@ -1340,14 +1341,15 @@ } if (ret == ZOK) { + size_t path_buh_len = strlen(path_buf); ST(0) = sv_newmortal(); #ifdef SV_HAS_TRAILING_NUL - sv_usepvn_flags(ST(0), path_buf, strlen(path_buf), + sv_usepvn_flags(ST(0), path_buf, path_buf_len, SV_HAS_TRAILING_NUL); #else - sv_usepvn(ST(0), path_buf, strlen(path_buf)); + sv_usepvn(ST(0), path_buf, path_buf_len); #endif - SvCUR_set(ST(0), strlen(path_buf)); + SvCUR_set(ST(0), path_buf_len); XSRETURN(1); } @@ -2657,6 +2659,9 @@ } done = watch->done; + if (watch->event_type == ZOO_SESSION_EVENT) { + watch->done = 0; + } pthread_mutex_unlock(&watch->mutex);
From: chrishammond [...] ymail.com
Hi Aleksey, This patch appears to make t/main.t in Net::ZooKeeper::Lock 0.02 hang indefinitely. I'm afraid I'm not familiar enough with the internals of Net::ZooKeeper to hazard as guess as to why. Regards, Chris
Thank you for your report. I found misprint in variable name in my patch. Fixed patch is attached. Птн Янв 13 11:08:43 2012, chrishammond писал: Show quoted text
> Hi Aleksey, > > This patch appears to make t/main.t in Net::ZooKeeper::Lock 0.02 hang > indefinitely. I'm afraid I'm not familiar enough with the internals of > Net::ZooKeeper to hazard as guess as to why. > > Regards, > > Chris
Subject: perl-Net-ZooKeeper-memory.patch
--- ZooKeeper.xs 2009-04-20 20:44:47.000000000 +0400 +++ ZooKeeper.xs 2011-11-24 19:43:32.000000000 +0400 @@ -249,12 +249,13 @@ static void _zk_release_watch(pTHX_ zk_watch_t *watch, int list) { if (list) { + zk_watch_t *prev = watch->prev; if (watch->prev) { watch->prev->next = watch->next; watch->prev = NULL; } if (watch->next) { - watch->next->prev = watch->prev; + watch->next->prev = prev; watch->next = NULL; } } @@ -278,7 +279,7 @@ if (!final) { pthread_mutex_lock(&watch->mutex); - done = watch->done; + done = watch->done && watch->event_type != ZOO_SESSION_EVENT; pthread_mutex_unlock(&watch->mutex); } @@ -1340,14 +1341,15 @@ } if (ret == ZOK) { + size_t path_buf_len = strlen(path_buf); ST(0) = sv_newmortal(); #ifdef SV_HAS_TRAILING_NUL - sv_usepvn_flags(ST(0), path_buf, strlen(path_buf), + sv_usepvn_flags(ST(0), path_buf, path_buf_len, SV_HAS_TRAILING_NUL); #else - sv_usepvn(ST(0), path_buf, strlen(path_buf)); + sv_usepvn(ST(0), path_buf, path_buf_len); #endif - SvCUR_set(ST(0), strlen(path_buf)); + SvCUR_set(ST(0), path_buf_len); XSRETURN(1); } @@ -2657,6 +2659,9 @@ } done = watch->done; + if (watch->event_type == ZOO_SESSION_EVENT) { + watch->done = 0; + } pthread_mutex_unlock(&watch->mutex);
Fixed in 0.36. See https://issues.apache.org/jira/browse/ZOOKEEPER-1380 for more details.