Store whole database in memory

Nico Meyer nico.meyer at adition.com
Sun May 29 05:20:31 EDT 2011


Hi Michael,

Greg's advice is probably the best, if you really always want to read 
back or update predefined groups of 1000 keys at once. It will increase 
the rate at which you can write and read by a factor of 1000 ;-).

But if that's not what you want to do, and we really don't know what 
your design goals are, I honestly think you are trying to put a screw in 
with a hammer here. Maybe you should look for alternatives to Riak, 
since you exploit all its weaknesses an don't care about most of its 
strengths.

Namely storing very small values is a weak spot, as Greg mentioned. 
There is an overhead of at least around 400 bytes per entry at the 
moment. Even if there are plans to reduce this overhead, I would 
estimate it will never get below around 100 bytes if the thing is still 
Riak afterwards. Also this overhead exists for all storage backends. So 
with the ets backend you will only be able to store about 2-3 million 
entries per GB of RAM right now.

Which brings me to the part where you don't care/use most of Riak's 
strengths. You don't seem to care about persistence of data, otherwise 
you wouldn't use a memory only backend. (Btw, as Mike pointed out, with 
enough RAM bitcask is essentially a memory store, especially where the 
write performance is concerned)
You also don't care about eventual consistency, evidenced by the fact 
that you do bulk inserts (only?), and that 12 bytes wouldn't allow for 
enough information to resolve conflicts. So you probably want a last 
write wins behaviour (which can be set as a bucket property in Riak, but 
kind of defeats the purpose in my opinion).

But lets assume Riak was the right tool for your job for a moment.
The limiting factor for writing your data is almost certainly not the 
disk. Writing 100,000 keys with a size of 12 bytes requires only about 
1MB/s, so event the crappiest disk should have no problem with that. But 
as I said there is quite a large overhead for storing values in Riak, so 
in reality the required rate will be 50MB/s per node (3 nodes, n=3 
presumably). Still not a big deal, and this only is a limiting factor 
once the filesystem cache uses all available RAM.

On the other hand, network latency is a problem at such high rates, even 
in a LAN. As far as my experience an my short Google research tell me, 
that the lowest roundtrip time you can expect on standard Gigabit 
ethernet is on the order of 0.1msec or 1/10000 second. For each 
operation you need at least one roundtrip (one request packet, one 
reponse packet), so that means with one connection you can never go 
beyond 10,000 writes per second. This assumes no processing time 
whatsoever, so a more realistic number is 2000-5000 ops/s. Therefore you 
need at least 20-50 parallel connections or clients to achieve your 
target write rate. If you use the Rest API these numbers need to be 
doubled, since one additional roundtrip is already need to set up the 
TCP connection.
In general without a lot of tuning and maybe specialized hardware 
(multiple NICs or special low latency NICs) any server will have a hard 
time to handle 100,000 ops/s, regardless of the software that is used.


Cheers,
Nico


On 28.05.2011 20:36, Greg Nelson wrote:
> Depending on the n_val you have set for that bucket, Riak will store the
> objects n times on n different nodes. There are two other parameters you
> should know about, r and w. When writing, Riak will wait for w of the n
> nodes to finish the write before returning. When reading, Riak will wait
> for r of the n nodes to respond before returning. This is the basics of
> how Riak does fault and partition tolerance, i.e. if one node is down
> your cluster still functions, and the r and w vals define a sort of
> "majority vote" threshold to handle a split-brain problem.
>
> Anyway, for your purposes you could set w=1 and r=3 for faster writes at
> the expense of potentially slower reads. I've never tried this (or any
> of the backends besides bitcask) so I don't know what you should expect.
>
> As for bulk insert and preserving locality, I don't know of a way to do
> that with Riak except to batch your 1000 keys into a single object,
> identified by one key. As far as Riak is concerned, it's just a 12KB
> opaque object, which your application would need to always write and
> read all at once.
>
> If you don't batch like that, you should look for a discussion on this
> mailing list from last week regarding capacity planning and very small
> objects. There's a bit of overhead associated with each object that will
> be significant for objects as small as 12 bytes. You could skip over the
> parts about Bitcask overhead...
>
> On Saturday, May 28, 2011 at 9:59 AM, Michael McClain wrote:
>
>> Thank you, Mike and Greg, for the response.
>> I've just replied to the list.
>> In my use case, I need to be able to write 100,000 keys per second.
>> Where the key is very small (12 bytes). And I always insert 1000 keys
>> at once, in a bulk insert. I would also like to preserve the locality
>> of the keys inserted at once (so that they stay always in the same
>> node). Do you know if that is possible?
>>
>> Thank you
>>
>> 2011/5/28 Mike Oxford <moxford at gmail.com <mailto:moxford at gmail.com>>
>>> With enough RAM you could just have it keep the whole thing in
>>> disk-cache...
>>>
>>> -mox
>>>
>>>
>>> On Fri, May 27, 2011 at 11:11 PM, Greg Nelson <grourk at dropcam.com
>>> <mailto:grourk at dropcam.com>> wrote:
>>>> Michael,
>>>>
>>>> You might want to check out riak_kv_ets_backend,
>>>> riak_kv_gb_trees_backend, and riak_kv_cache_backend.
>>>>
>>>> http://wiki.basho.com/Configuration-Files.html
>>>>
>>>> <http://wiki.basho.com/Configuration-Files.html>-Greg
>>>>
>>>> On Friday, May 27, 2011 at 10:35 PM, Michael McClain wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is it possible to store the whole database in memory?
>>>>> In a similar way as Redis does.
>>>>>
>>>>> I'm really interested in the distributed map reduce done by riak
>>>>> ("bring processing to the data, instead of data to processors), but
>>>>> I need faster writes/reads that a memory-only database could provide.
>>>>> In case you don't support memory-only storage (no disk touched /
>>>>> all keys and data fitting the memory in all nodes) yet, do you plan
>>>>> on implementing it?
>>>>>
>>>>> Thank you,
>>>>> Michael
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list