Unexpected performance results with riak

Max nash12 at gmail.com
Sun May 15 10:55:46 EDT 2011

In the last days I played a bit with riak. The initial setup was
easier then I thought. Now I have a 3 node cluster, all nodes running
on the same vm for the sake of testing.

I admit, the hardware settings of my virtual machine are very much
downgraded (1 CPU, 512 MB RAM) but still I am a quite surprised by the
slow performance of riak.

- Map Reduce

Playing a bit with map reduce I had around 2000 objects in one bucket,
each about 1k - 2k in size as json. I used this map function:

function(value, keyData, arg) {
    var data = Riak.mapValuesJson(value)[0];

    if (data.displayname.indexOf("max") !== -1) return [data];
    return [];

And it took over 2 seconds just for performing the http request
returning its result, not counting the time it took in my client code
to deserialze the results from json. Removing 2 of 3 nodes seemed to
slightly improve the performance to just below 2 seconds, but this
still seems really slow to me.

Is this to be expected? The objects were not that large in bytesize
and 2000 objects in one bucket isnt that much, either.

- Insert

Batch inserting of around 60.000 objects in the same size as above
took rather long and actually didnt really work.

My script which inserted the objects in riak died at around 40.000 or
so and said it couldnt connect to the riak node anymore. In the riak
logs I found an error message which indicated that the node ran out of
memory and died.

- Question

This is really my first shot at riak, so there is definately the
chance that I screwed something up.

    Are there any settings I could tweak?
    Are the hardware settings too constrained?
    Maybe the PHP client library I used for interacting with riak is
the limiting factor here?
    Running all nodes on the same physical machine is rather stupid,
but if this is a problem - how can i better test the performance of
    Is map reduce really that slow? I read about the performance hit
that map reduce has on the riak mailing list, but if Map Reduce is
slow, how are you supposed to perform "queries" for data needed nearly
in realtime? I know that riak is not as fast as redis.

It would really help me a lot if anyone with more experience in riak
could help me out with some of these questions.

I posted this as a question on stackoverflow, but it was suggested to
rather ask on this mailing list, I guess thats true as it is more
likely that I could get an answer here. I am really eager to learn
more about riak and how I can use it correctly to get the best results
out of it.

Thank you very much for any help in advance!

