Unexpected performance results with riak

Sean Cribbs sean at basho.com
Sun May 15 13:55:47 EDT 2011

On May 15, 2011, at 10:55 AM, Max wrote:
> Is this to be expected? The objects were not that large in bytesize
> and 2000 objects in one bucket isnt that much, either.

Consider that the JSON-encoding of a list of these 2000 objects has to happen on server-side as well, that's going to be big in RAM before you even try to encode it.  Instead, use chunked=true in your request, which will return results to you as they are completed and returned to the coordinating node (in multipart/mixed format).

> - Insert
> Batch inserting of around 60.000 objects in the same size as above
> took rather long and actually didnt really work.
> My script which inserted the objects in riak died at around 40.000 or
> so and said it couldnt connect to the riak node anymore. In the riak
> logs I found an error message which indicated that the node ran out of
> memory and died.

512MB is pretty small for any database that will have a significant working set. Give it some more RAM.

> - Question
> This is really my first shot at riak, so there is definately the
> chance that I screwed something up.

I don't think you've screwed anything up, you just didn't have context for why things work the way they do.

>    Are there any settings I could tweak?
>    Are the hardware settings too constrained?
>    Maybe the PHP client library I used for interacting with riak is
> the limiting factor here?

Again, give Riak more RAM and 3 or more nodes and you'll get better performance.  We (purposefully) don't optimize for the 1-node use-case.  While Riak's default storage doesn't keep any data in RAM, you'll need memory for the node itself, for the storage engine's key-tables, and for copies of any data that is in-flight. For reference, a fresh, quiescent node takes only 25MB on my machine.

>     Running all nodes on the same physical machine is rather stupid,
> but if this is a problem - how can i better test the performance of
> riak?

If you're trying to test performance (beyond a micro-benchmark), by all means use multiple machines.  Ideally, use ones similar enough to your deployment environment that you'll have a good idea how things will behave in production.

>     Is map reduce really that slow? I read about the performance hit
> that map reduce has on the riak mailing list, but if Map Reduce is
> slow, how are you supposed to perform "queries" for data needed nearly
> in realtime? I know that riak is not as fast as redis.

Riak and Redis are apples-and-oranges (but go well together). Redis is fast for certain operations because it has data structures specifically designed for fast, single-server execution of those operations.  On the other hand, Riak is designed primarily for fault-tolerance in the case of network and hardware failure, it just happens that MapReduce fits nicely on top of that kind of system (and is generic, rather than specifically tuned). Yes, Riak's MapReduce is designed for low-latency queries, but it's also designed for "targeted" queries where you don't slurp the entire contents of a bucket from disk.  Structure your data so that you can know -- or deterministically guess -- what keys should be involved in your query. Then you won't be trying to fetch and filter all of the data all the time.

Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.

More information about the riak-users mailing list