Nightly Prune

Chad Engler Chad.Engler at
Fri Feb 8 09:12:50 EST 2013

Thanks for the tips, I will look into putting the dates in the key values and filtering that way.




From: Christian Dahlqvist [mailto:christian at] 
Sent: Friday, February 08, 2013 8:59 AM
To: Stephan Kepser
Cc: Chad Engler; riak-users at
Subject: Re: Nightly Prune




I would strongly advise against setting up a portion of your nodes with memory backend and using W=1 in order to speed up writes as you will run the risk of losing data, especially in failure scenarios where one of the nodes you rely on for writing to disk fails. As Riak manages spreading out replicas across the cluster, you could end up having portions of the data only stored in memory backends if you have a sufficiently large number of nodes configured this way.


An alternate approach that may work for your case, at least if you have the date you want to base your pruning on somewhere in the key (which is something I would recommend), would be to use key filters ( as input to a mapreduce job in order to delete the data. Using key filters is generally more efficient than performimg filtering in the map phase based on data as it operates on the key alone and do not require the object to be read from disk. It also does not require secondary indexes, which makes it possible to use with BitCask.


I have created a little mapreduce utility library ( that among other things contain a map_delete function implemented in Erlang that ,as the name suggests, deletes records passed into it. This could be used together with the key filter to perform the delete. When using this, it is often a good idea to process only a subsets of the key space each time in order to spread the processing out and not overwhelm the cluster with deletes.


Best regards,





On 8 Feb 2013, at 09:36, Stephan Kepser <stephan.kepser at> wrote:

Hi Chad,

yes, you're right: the use of secondary indexes requires LevelDB  as backend. How much of an performance penalty this imposes to you I really don't know. I'd still consider the use of secondary indexes because it is an architecturally clean solution and hence simple to implement and maintain. If you experience performance issues you still have the option to scale out, i.e, use more nodes. 

There may yet be another option to speed up write operations, depending on your demands. You could use the memory backend for about a third of your nodes and set the write quorum to 1. This turns the memory-only nodes into something like a cache without any further need to administer this cache. On the other hand I have to say I'm not sure you can enforce that at least one replica of each data item goes to a node with LevelDB backend. 



2013/2/7 Chad Engler <Chad.Engler at>

I'm writing the prune script in Node, and the dates are stored as int timestamps so that isn't an issue.


I was under the impression that secondary indexes only worked on the LevelDB backend, we have a solid write throughput (maybe 10ish writes per second) and very little read (aside from the prune). Would we see significant performance degradation by switching from Bitcask?




From: riak-users [mailto:riak-users-bounces at] On Behalf Of Stephan Kepser
Sent: Thursday, February 07, 2013 1:50 PM
To: riak-users at
Subject: Re: Nightly Prune


Hi Chad,

I recommend looking at secondary indexes. You can set up a secondary index with the relevant date from your entry. Note that there is no data type date, only string or integer. But you can easily convert a date into a string or an integer for your query purposes. Secondary indexes even provide you with a way to query for date ranges. And they are fast. So, I think they'd serve your purpose.



Dr. Stephan Kepser | Senior IT-Consultant

codecentric AG | Merscheider Straße 1 | 42699 Solingen | Deutschland
tel: +49 (0) 212.23362845 <tel:%2B49%20%280%29%20212.23362845>  | fax: +49 (0) 212.23362879 <tel:%2B49%20%280%29%20212.23362879>  | mobil: +49 (0) 151.52883635 <tel:%2B49%20%280%29%20151.52883635> <>  | <>  | <>  | <> 

riak-users mailing list
riak-users at


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list