multi-get (yet again)

Eric Moritz eric at
Thu Aug 9 07:54:54 EDT 2012

I toyed with a pmap in Python a while back to attempt to speed up multiple
HTTP request to our web services layer at work. You may want to attempt
that with gevent.

Here's the code I wrote which is probably not production ready.
 On Aug 9, 2012 4:46 AM, "Parnell Springmeyer" <ixmatus at> wrote:

> Jeremy,
> I was looking for something similar and first built an extra handler onto
> an internal erlang cowboy API server that used maelstrom (my own worker
> pool OTP application).
> It was used to make a simple POST with a string of the {bucket, key} pairs
> and the server would concurrently GET and combine the results and send it
> back. This was very fast (thousands of keys GET in ms).
> Since that seemed gross, I then decided (based on some input from someone
> else on the list) to try using a simple Map/Reduce phase that did not use
> javascript but the erlang functions (since those are going to be really
> fast and take advantage Erlang's concurrency better than the javascript
> VM's).
> In python, you can do this to run that type of M/R phase without knowing
> any Erlang code:
> client = riak.RiakClient()
> # Add your KNOWN bucket and key pairs (you can do this in a loop)
> query = client.add(bucket, key)
> query.add(bucket, key)
> query.add(bucket, key)
> etc… (as many as you like)
> # Now tell the map and reduce phases to use Erlang module
> "riak_kv_mapreduce" and its given function
> # "map_object_value" and "reduce_set_union".
> results =["riak_kv_mapreduce", "map_object_value"]) \
>                  .reduce(["riak_kv_mapreduce", "reduce_set_union"]) \
>                  .run()
> The above returns results faster for me, than the brokered multi-get
> approach I used (I guarantee my brokered multi-get is faster than anything
> you can do with python + gevent, if that's the case, the M/R phase is
> definitely the route you want to go).
> So IMHO, it is very fast as long as you know the buckets and keys you want
> to get.
> On Aug 9, 2012, at 12:11 AM, Jeremy Dunck wrote:
> > I'm new to riak and need multi-get (that is, getting the value and/or
> > existence of keys in a single network-trip latency).
> >
> > I was wondering what the latency of the map-reduce approach is?
> >
> >
> > Alternatively, has anyone tried scaling concurrent gets (perhaps with
> > evented io) to do many concurrent requests and combining results on
> > the client?
> >
> > I am toying with a python+gevent multiget function.  If the stance is
> > still that a multiget operation doesn't belong in core, I'm a little
> > surprised that there doesn't seem to at least be a nice client-lib API
> > func to do it.  It sure seems useful...
> >
> > In my use-case, the immediate need is to know whether a db insert
> > needs to be done.  We're handling too many keys to want to store in
> > memory (so no redis, etc), and we don't want to go to the db more than
> > we need to, so it seems riak would be good here.  But we're getting
> > 1000s of potential insert keys and want to whittle down all those to a
> > relative few db inserts.
> >
> > So I was thinking riak key-per-id, and insert to the db iff the riak
> > key doesn't exist, then add the riak key.  We'll get some race
> > conditions on the insert, but that's OK in our case.
> >
> > We do need low latency on the riak check, though, hence either
> > multiplexing w/ eventing or map-reduce (if that latency is actually
> > good).
> >
> > Am I doing it wrong?
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at
> >
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list