Schema Architecture, Map Reduce & Key Lists

Nico Meyer nico.meyer at adition.com
Fri Feb 11 06:12:47 EST 2011


Hi Jeremiah!

Actually there should be no compaction at all if he only ever inserts
new keys, so the expire feature of bitcask won't help in this case.
Compactions/Merges only happen if keys have been updated or deleted.

Cheers,
Nico

Am Donnerstag, den 10.02.2011, 09:52 -0800 schrieb Jeremiah Peschka:
> Riak 0.14 brings key filters - it's still going to take time to filter
> the keys in memory, but it's an in memory operation. Using 'smart
> keys' along the lines of UNIXTIMESTAMP:placement:campaign:customer you
> can rapidly filter your keys using meaningful criteria and perform
> MapReduce jobs on the results.
> 
> 
> Nothing says you can't also store the same data in multiple buckets in
> multiple formats to make querying easier.
> 
> 
> In response to number 2 - there's a way to set Riak to auto expire
> data from a bucket. It'll only be removed when compactions occur, but
> if you're storing clickstream data that should be happen often enough.
> 
> -- 
> Jeremiah Peschka
> Microsoft SQL Server MVP
> MCITP: Database Developer, DBA
> 
> 
> On Thursday, February 10, 2011 at 9:35 AM, Mat Ellis wrote:
> 
> > We are converting a mysql based schema to Riak using Ripple. We're
> > tracking a lot of clicks, and each click belongs to a cascade of
> > other objects:
> > 
> > 
> > click -> placement -> campaign -> customer
> > 
> > 
> > i.e. we do a lot of operations on these clicks grouped by placement
> > or sets of placements.
> > 
> > 
> > Reading
> > this http://lists.basho.com/pipermail/riak-users_lists.basho.com/2010-July/001591.html gave me pause for thought. I was hoping the time needed to crunch each day's data would be proportional to the volume of clicks on that day but it seems that it would be proportional to the total number of clicks ever.
> > 
> > 
> > What's the best approach here? I can see a number of 'solutions'
> > each of them complicated:
> > 
> > 
> > (1) Maintain an index of clicks by day so that we can focus our
> > operations on a time bound set of clicks
> > 
> > 
> > (2) Delete or archive clicks once they have been processed or after
> > a certain number of days
> > 
> > 
> > (3) Add many links to each placement, one per click (millions
> > potentially)
> > 
> > 
> > On a related noob-note, what would be the best way of creating a set
> > of the clicks for a given placement? Map Reduce or Riak Search or
> > some other method?
> > 
> > 
> > Thanks in advance.
> > 
> > 
> > M.
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list