How to process batch of events in N seconds after latest

Greg Burd greg at basho.com
Sat May 19 03:42:31 EDT 2012


Max, 

This sounds a bit complex, what would need to happen if you didn't process an event (or batch of events) in time?  What about using time-based expiry for your events which is supported by the Bitcask backend.  You could use Multi-backend to setup a bucket that expires in N seconds.  When you write your last event in a batch write a key/value pair to the bucket that expires with the list of keys that was in that batch.  Make the key meaningful enough that your program doesn't have to look it up, it can guess it from other context.

see: http://wiki.basho.com/Bitcask.html

Automatic Expiration 
By default, Bitcask keeps all of your data around. If your data has limited time-value, or if for space reasons you need to purge data, you can set the expiry_secs option. If you needed to purge data automatically after 1 day, set the value to 86400.
Default is: -1 which disables automatic expiration


{bitcask, [ ..., {expiry_secs, -1}, %% Don't expire items based on time ... ]} 



@gregburd
Developer Advocate, Basho Technologies | http://basho.com | @basho


On Tuesday, May 15, 2012 at 1:56 PM, Max Ivanov wrote:

> Hi,
> 
> what's the best approach to process batch of events in N seconds after
> latest event in a group happen? Events are grouped by key.
> 
> I am thinking about following scheme:
> 
> 1) events are recorded in a way that every write creates new sibling
> to avoid read/write multiple cycles per event
> 2) with every write new secondary index is created with value =
> "sweep_at_$current_time + N"
> 3) every second process queries Riak for secondary keys with values <=
> "sweep_at_$current_time"
> 4) for every item returned it queries all it's siblings:
> - if there are siblings, then merge them into 1 record, calculate and
> write new secondary index "seep_at_$latest_sibling_time + N". Go to
> next substep if newly calculated timeout value is <= current time.
> - if there are no siblings, process them and remove key from Riak
> 
> Therefore for every batch of N events on average (given that 99% of
> event batches timespans are less than N) there will be:
> N+1 writes and 2 secondary index seek and 2 reads
> 
> Is it correct approach for Riak? It could be improved further by
> carefully setting secondary index on stage 2 so that merge of all
> sibling will be immediately followed by processing of events batch,
> but right now I am more intrested wether it fit nicely to Riak.
> 
> Thank you.
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com (mailto:riak-users at lists.basho.com)
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120519/15e2798b/attachment.html>


More information about the riak-users mailing list