Subject: | bitvector: memset after malloc, hash details |
Date: | Fri, 5 Nov 2010 16:44:48 -0500 |
To: | "bug-Bloom-Faster [...] rt.cpan.org" <bug-Bloom-Faster [...] rt.cpan.org> |
From: | "Cassidy, Justin M. (ARC-IQ)[PEROT SYSTEMS]" <justin.m.cassidy [...] nasa.gov> |
Hi,
I've been using Bloom::Faster for a few years, to generate indexes for each minute of a stream of incoming data. When upgrading to 64-bit systems, I noticed some problems that were show-stoppers in both 1.4 and 1.7, and after staring at the code for a while found a fix.
On my CentOS 5 system, around 30% of the time the bit vector for the hash would get malloc'ed and be full of existing bits set to one. This was confirmed by logging the hash insert values, and noticing no inserts would occur on the bit vectors which were already stuffed with ones. I assumed a similar problem might occur with other malloc'ed entries such as the array of salts, so I did memset there also... but I think sprintf makes this unnecessary.
Less importantly, your jenkins.c file has a 32-bit-optimized version of Bob Jenkins' hash code. On his website, there's somewhat vague instructions for how to create a 64-bit version of this hash function that's slightly faster, and I did this also.
I'm happy to send patches of what I did this weekend, but I don't consider myself a expert C or Perl guy. :)
Thanks,
Justin
Built on:
Ubuntu Karmic (2.6.31)
Perl 5.10.0
gcc 4.4.1
Ran on:
CentOS 5 x86_64 (2.6.18)
perl 5.8.8