Need help understanding read-repair (eleveldb backend)

Travis Turner travis at umbel.com
Wed Mar 14 18:17:31 EDT 2012


I have set up a test 3-node cluster with eleveldb backend (I'll call them nodes node-A, node-B, & node-C)
I populated a single bucket with 52,000 objects, 250 of which contain a specific secondary index that i can query on.
Bucket properties are set to default:

{"props":{"name":"my_bucket","allow_mult":false,"basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}}


For queries, I use the Riak python binding w/ protocol buffers.

Under normal conditions:
bucket.get_keys() returns 52000 keys (as expected)
MapReduce.index(my_secondary_index) returns 250 keys (as expected)


Next, I tested the failure of one node (node-C) by doing:
node-C>  riak stop
node-A> riak-admin down node-C
node-A> riak-admin force-remove node-C

At this point, my test queries look like:
bucket.get_keys() returns ~34,600 keys
MapReduce.index(secondary_index) returns ~170 keys


I understand that I can force read-repair on all of the objects by reading all objects in the bucket with a R less than N.
My question is, how do I get a list of all keys in the bucket when get_keys() doesn't seem to return the full list of keys once a node has gone away?
Ultimately, I would like to make sure my secondary-index query returns the correct result even if a node goes away.

Travis


More information about the riak-users mailing list