Skip Menu |

This queue is for tickets about the Cache-FastMmap CPAN distribution.

Report information
The Basics
Id: 52051
Status: resolved
Priority: 0/
Queue: Cache-FastMmap

People
Owner: Nobody in particular
Requestors: airwave [...] cpan.org
Cc: dd [...] airwave.com
AdminCc:

Bug Information
Severity: Critical
Broken in: 1.34
Fixed in: (no value)



CC: dd [...] airwave.com
Subject: Cache can return stale data if some keys are deleted

Message body is not shown because it is too large.

Subject: mmap_issue.pl
#!/usr/bin/perl -w use strict; use IO::File; use Time::HiRes qw(time sleep); use Cache::FastMmap; my $worker_count = $ARGV[0] || 1; my $key_count = int(165 / $worker_count); my $run_time = int(120 / $worker_count); my $page_count = 2; # smallest prime my $page_size = '128k'; # nice and big to avoid LRU kicking in my $scan_interval = 3 / $worker_count; my $log_file = "/tmp/mmap_driver.log"; open(LOG, ">$log_file") or die "open(>$log_file): $!"; autoflush LOG 1; sub logit { printf LOG "%s - " . shift() . "\n", time, @_ } my $cache_file = '/tmp/mmap.cache'; my %cache_args = ( share_file => $cache_file, page_size => $page_size, num_pages => $page_count, init_file => 0, raw_values => 0, expire_time => 0, unlink_on_exit => 0, ); # rebuild cache unlink $cache_file if -f $cache_file; Cache::FastMmap->new( %cache_args, init_file => 1, ); # spawn off $worker_count getter/putters my @worker_pids; @worker_pids = map { spawn_worker() } (1 .. $worker_count); close(LOG); sleep $run_time; kill 'INT', @worker_pids; waitpid $_, 0 for @worker_pids; open(LOG, "<$log_file") or die "open(<$log_file): $!"; while (<LOG>) { my ($type, $key_id) = /^\d+\.\d+ - (\*|\+|\@) \d+\.ap_(\d+)/; print if $key_id && $key_id >= int($key_count/2) && $type ne '+'; } close(LOG); printf "\n%d second run with %d workers; %d keys per worker (%d keys total).\n", $run_time, $worker_count, $key_count, $key_count * $worker_count; printf "Cache with %d pages, each sized at %s\n", $page_count, $page_size; printf " (key numbers at and above %d were never deleted)\n", int($key_count/2); print " (full log results are still available in $log_file)\n"; printf " (the scan for + and * issues was run every %s seconds)\n", $scan_interval; print <<EOLEGEND; LEGEND: keys are <pid>.ap_<key_id> Included in above output (but only for keys NEVER deleted): @ (an inserted value was immediately read and incorrect/stale) * (a previously-inserted value was read later and incorrect/stale) Included in $log_file but not printed above: I (a key was inserted with a particular hi-res timestamp) D (a key was deleted; included timestamp of deleted key) + (a key was found with more than one data entry) @ is the worst, but the * events also cause our application to break EOLEGEND my %last; sub spawn_worker { my $pid = fork_and_run(sub { my @keys = map { "$$.ap_$_" } (1..$key_count); my $cache = Cache::FastMmap->new(%cache_args); my $last_checked = 0; while (1) { stick_one_in($cache, \@keys, \%last); # 30% chance of deleting a key for each insert if (rand(10) < 3) { # Only delete from a subset of the key-space (first half) my $delkey = rand_key(\@keys, int($key_count/2)); my $deleted = $cache->get_and_remove($delkey); logit("D $delkey%s", $deleted ? " ($deleted->{time})" : ''); delete $last{$delkey}; } if (time - $last_checked > $scan_interval) { check_keys($cache, \%last); $last_checked = time; } } }); return $pid; } sub rand_key { $_[0][int(rand($_[1] || @{$_[0]}))] } sub stick_one_in { my ($cache, $keys, $lasts) = @_; my $key = rand_key($keys); my $sample = { pid => $$, time => time, }; $cache->set($key => $sample); logit("I $key (%s)", $sample->{time}); $lasts->{$key} = $sample; my $got = $cache->get($key); logit("\@ $key (got %s, wanted %s)", $got->{time} || '<undef>', $sample->{time} || '<undef>', ) unless $got->{time} eq $sample->{time}; } sub check_keys { my ($cache, $lasts) = @_; my %counts; foreach my $key ($cache->get_keys(0)) { next unless $lasts->{$key}; $counts{$key}++; } foreach my $key (keys %counts) { logit("+ $key") if $counts{$key} > 1; my $got = $cache->get($key); logit("* $key (got %s, wanted %s)", $got->{time} || '<undef>', $last{$key}{time} || '<undef>', ) unless $got->{time} eq $last{$key}{time}; } } sub fork_and_run { my ($code_ref) = @_; my $pid = fork; if ($pid) { # parent return $pid; } elsif (defined $pid) { # child $code_ref->(); exit 0; } else { # fork error die "fork: $!"; } }
Subject: stale_cache_data_fix.diff
Index: Cache-FastMmap/src/Cache-FastMmap-CImpl/mmap_cache.c =================================================================== --- Cache-FastMmap/src/Cache-FastMmap-CImpl/mmap_cache.c (revision 368) +++ Cache-FastMmap/src/Cache-FastMmap-CImpl/mmap_cache.c (revision 369) @@ -986,6 +986,7 @@ MU32 slots_left, * slots_end; /* Modulo hash_slot to find starting slot */ MU32 * slot_ptr = cache->p_base_slots + (hash_slot % cache->p_num_slots); + MU32 * first_deleted = (MU32 *)NULL; /* Total slots and pointer to end of slot data to do wrapping */ slots_left = cache->p_num_slots; @@ -1003,12 +1004,16 @@ /* data_offset == 0 means empty slot, and no more beyond */ /* data_offset == 1 means deleted slot, we can reuse if writing */ - if (data_offset == 0 || (data_offset == 1 && mode == 1)) { - + if (data_offset == 0) { /* Return pointer to last checked slot */ return slot_ptr; } - + if (data_offset == 1 && mode == 1 && NULL == slot_ptr) { + /* Save pointer to first usable slot; if we don't find the key later, + we'll fall back to returning this. + */ + first_deleted = slot_ptr; + } /* deleted slot, keep looking */ if (data_offset == 1) { @@ -1032,7 +1037,10 @@ ASSERT(slot_ptr >= cache->p_base_slots && slot_ptr < slots_end); } - return 0; + if (1 == mode && NULL != first_deleted) + return first_deleted; + else + return 0; } /*
Hi Show quoted text
> The problem is that when deleting some keys (even if it's a subset of > the total key-space), you can get a "stale" data element out of the > cache instead of the most-recently-inserted value for that key. This > can affect keys which were never deleted and only ever inserted. It > seems like a useful guarantee of Cache::FastMmap would be: assuming > that > an existing value hasn't been expunged, the value returned by a get is > always the most-recently-inserted value.
Sorry about the time getting back to this, I completely forgot about it in my cpan folder. Yeah this was a subtle and nasty bug, sorry you had to be the one to find it. I think your patch makes sense, so I've uploaded 1.35 with it incorporated. Show quoted text
> I don't have a unit test exhibiting the problem, but the attached > script > could easily be adapted for this purpose (our application now has a > unit
It's a little tricky to trigger, but I think I found a looping set of set/remove actions that did it, which I've added as 16.t. Without your patch it dies, with your patch it completes. Rob
Should be fixed in 1.35