map-reduce Problem ?

Dan Reverri dan at basho.com
Mon Nov 15 17:40:19 EST 2010


The bucket/key pair is passed around in a 2-tuple:
https://github.com/basho/riak_kv/blob/0093af40f8ba97038e98dd04dfea70ef889ff213/src/riak_kv_put_fsm.erl#L84

<https://github.com/basho/riak_kv/blob/0093af40f8ba97038e98dd04dfea70ef889ff213/src/riak_kv_put_fsm.erl#L116>Each
backend can manage the bucket/key pair however it wants. For example, the
Bitcask backend uses term_to_binary/1 to convert the bucket/key pair to a
single key:
https://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L98

When the backend lists keys in a bucket, it can extract the bucket name from
the key term:
h<https://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L123>
ttps://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L123<https://github.com/basho/riak_kv/blob/master/src/riak_kv_bitcask_backend.erl#L123>

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
dan at basho.com


On Mon, Nov 15, 2010 at 2:11 PM, Alexander Sicular <siculars at gmail.com>wrote:

> So I get that riak is not bucket aware. When you pass a bucket as an
> input in an m/r, as riak sifts through all the keys, how does riak
> isolate bucket specific keys? Are keys stored as /bucket/key internaly
> and there is a string comparison on split(key,'/') ? Or is there
> something else going on.
>
> Thank you.
>
>
>
> On 2010-11-15, Kevin Smith <ksmith at basho.com> wrote:
> > We are giving some thought on how to do that. The main issues wrt to
> > bitcask's key listing performance is that bitcask is not bucket aware and
> > lacks the notion of secondary indices. Not being bucket aware means
> bitcask
> > has to examine all bucket/key pairs to find the ones related to a given
> > bucket. This isn't to say we won't address the problem but merely to
> point
> > out there's some engineering work required to solve the problem
> correctly.
> >
> > innostore is moderately bucket-aware right now so I've forked it
> > (http://github.com/kevsmith/innostore) and added bucket-aware key
> listing.
> > Based on some very basic testing I'm seeing 2.5x speed up in overall key
> > listing performance compared to the official version. I'm hoping the
> patch,
> > or a modified form of it, will make the next release. If you can handle
> inno
> > being a bit slower than bitcask and slightly more difficult to set up and
> > tune then this might be an option for you.
> >
> > I've done some basic vetting of the code but I want to emphasize this is
> a
> > prototype only and hasn't received anything even close to the normal
> amount
> > of testing we put into a release. Please keep this in mind if you decide
> to
> > use my forked repo.
> >
> > --Kevin
> > On Nov 15, 2010, at 11:57 AM, Greg Steffensen wrote:
> >
> >> Along these lines, are there any ideas floating around about how to
> speed
> >> up the listing of keys in a bucket?  For the bitcask backend, it seems
> >> like an index of keys-by-bucket ought to be the kind of thing that could
> >> be stored in the hints files to speed this up without affecting
> >> performance for live reads and writes.
> >>
> >> Greg
> >>
> >> On Mon, Nov 15, 2010 at 11:46 AM, Sean Cribbs <sean at basho.com> wrote:
> >> This is possible with Riak's MapReduce but you will likely have
> increasing
> >> difficulty as your dataset grows, because of the impact of needing to
> list
> >> keys in a bucket and then eliminate data points you aren't interested
> in.
> >> In the longer term, there will be improvements to MapReduce such that if
> >> your keys are meaningful, you will be able to filter them more easily
> >> (without examining the data first).  You might find Kevin Smith's
> overview
> >> enlightening: http://www.slideshare.net/hemulen/riak-mapred-preso
> >>
> >> Sean Cribbs <sean at basho.com>
> >> Developer Advocate
> >> Basho Technologies, Inc.
> >> http://basho.com/
> >>
> >> On Nov 15, 2010, at 11:34 AM, Prometheus WillSurvive wrote:
> >>
> >>> Hi ,
> >>>
> >>> We have a huge database (around 4 billion record - 30 TB) storing the
> >>> video watch infromation ie view count , comment , favorited etc. I want
> >>> to produce daily report for all videos view counts. It means I need to
> >>> look 2 day , today and yesterday so subtract yesterdey view count from
> >>> today view count so I can find the daliy impression. Our Fat DB team
> >>> doing this a few complex queries. I would like to ask you is this
> >>> possible with Riak map-reduce way .  I want to make a demonstration to
> >>> the team to show this ..
> >>>
> >>> This is the scenario. We have similar data models for other thins. This
> >>> could be a start.
> >>>
> >>> We have 30xHP DL380  x32 Gig Ram  Farm  to test this scenario.
> >>>
> >>> Any riak map-reduce experienced member can show some idea on this..  I
> >>> guess.
> >>>
> >>> Regards
> >>>
> >>> Prometheus
> >>> _______________________________________________
> >>> riak-users mailing list
> >>> riak-users at lists.basho.com
> >>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >>
> >>
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
> --
> Sent from my mobile device
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20101115/a17c5d85/attachment.html>


More information about the riak-users mailing list