multi-get (yet again)

Parnell Springmeyer ixmatus at gmail.com
Thu Aug 9 04:46:11 EDT 2012


Jeremy,

I was looking for something similar and first built an extra handler onto an internal erlang cowboy API server that used maelstrom (my own worker pool OTP application).

It was used to make a simple POST with a string of the {bucket, key} pairs and the server would concurrently GET and combine the results and send it back. This was very fast (thousands of keys GET in ms).

Since that seemed gross, I then decided (based on some input from someone else on the list) to try using a simple Map/Reduce phase that did not use javascript but the erlang functions (since those are going to be really fast and take advantage Erlang's concurrency better than the javascript VM's).

In python, you can do this to run that type of M/R phase without knowing any Erlang code:

client = riak.RiakClient()

# Add your KNOWN bucket and key pairs (you can do this in a loop)
query = client.add(bucket, key)
query.add(bucket, key)
query.add(bucket, key)
etc… (as many as you like)

# Now tell the map and reduce phases to use Erlang module "riak_kv_mapreduce" and its given function 
# "map_object_value" and "reduce_set_union".
results = client.map(["riak_kv_mapreduce", "map_object_value"]) \
                 .reduce(["riak_kv_mapreduce", "reduce_set_union"]) \
                 .run()

The above returns results faster for me, than the brokered multi-get approach I used (I guarantee my brokered multi-get is faster than anything you can do with python + gevent, if that's the case, the M/R phase is definitely the route you want to go).

So IMHO, it is very fast as long as you know the buckets and keys you want to get.

On Aug 9, 2012, at 12:11 AM, Jeremy Dunck wrote:

> I'm new to riak and need multi-get (that is, getting the value and/or
> existence of keys in a single network-trip latency).
> 
> I was wondering what the latency of the map-reduce approach is?
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2011-February/003229.html
> 
> Alternatively, has anyone tried scaling concurrent gets (perhaps with
> evented io) to do many concurrent requests and combining results on
> the client?
> 
> I am toying with a python+gevent multiget function.  If the stance is
> still that a multiget operation doesn't belong in core, I'm a little
> surprised that there doesn't seem to at least be a nice client-lib API
> func to do it.  It sure seems useful...
> 
> In my use-case, the immediate need is to know whether a db insert
> needs to be done.  We're handling too many keys to want to store in
> memory (so no redis, etc), and we don't want to go to the db more than
> we need to, so it seems riak would be good here.  But we're getting
> 1000s of potential insert keys and want to whittle down all those to a
> relative few db inserts.
> 
> So I was thinking riak key-per-id, and insert to the db iff the riak
> key doesn't exist, then add the riak key.  We'll get some race
> conditions on the insert, but that's OK in our case.
> 
> We do need low latency on the riak check, though, hence either
> multiplexing w/ eventing or map-reduce (if that latency is actually
> good).
> 
> Am I doing it wrong?
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list