map-reduce Problem ?

Alexander Sicular siculars at gmail.com
Mon Nov 15 17:11:10 EST 2010


So I get that riak is not bucket aware. When you pass a bucket as an
input in an m/r, as riak sifts through all the keys, how does riak
isolate bucket specific keys? Are keys stored as /bucket/key internaly
and there is a string comparison on split(key,'/') ? Or is there
something else going on.

Thank you.



On 2010-11-15, Kevin Smith <ksmith at basho.com> wrote:
> We are giving some thought on how to do that. The main issues wrt to
> bitcask's key listing performance is that bitcask is not bucket aware and
> lacks the notion of secondary indices. Not being bucket aware means bitcask
> has to examine all bucket/key pairs to find the ones related to a given
> bucket. This isn't to say we won't address the problem but merely to point
> out there's some engineering work required to solve the problem correctly.
>
> innostore is moderately bucket-aware right now so I've forked it
> (http://github.com/kevsmith/innostore) and added bucket-aware key listing.
> Based on some very basic testing I'm seeing 2.5x speed up in overall key
> listing performance compared to the official version. I'm hoping the patch,
> or a modified form of it, will make the next release. If you can handle inno
> being a bit slower than bitcask and slightly more difficult to set up and
> tune then this might be an option for you.
>
> I've done some basic vetting of the code but I want to emphasize this is a
> prototype only and hasn't received anything even close to the normal amount
> of testing we put into a release. Please keep this in mind if you decide to
> use my forked repo.
>
> --Kevin
> On Nov 15, 2010, at 11:57 AM, Greg Steffensen wrote:
>
>> Along these lines, are there any ideas floating around about how to speed
>> up the listing of keys in a bucket?  For the bitcask backend, it seems
>> like an index of keys-by-bucket ought to be the kind of thing that could
>> be stored in the hints files to speed this up without affecting
>> performance for live reads and writes.
>>
>> Greg
>>
>> On Mon, Nov 15, 2010 at 11:46 AM, Sean Cribbs <sean at basho.com> wrote:
>> This is possible with Riak's MapReduce but you will likely have increasing
>> difficulty as your dataset grows, because of the impact of needing to list
>> keys in a bucket and then eliminate data points you aren't interested in.
>> In the longer term, there will be improvements to MapReduce such that if
>> your keys are meaningful, you will be able to filter them more easily
>> (without examining the data first).  You might find Kevin Smith's overview
>> enlightening: http://www.slideshare.net/hemulen/riak-mapred-preso
>>
>> Sean Cribbs <sean at basho.com>
>> Developer Advocate
>> Basho Technologies, Inc.
>> http://basho.com/
>>
>> On Nov 15, 2010, at 11:34 AM, Prometheus WillSurvive wrote:
>>
>>> Hi ,
>>>
>>> We have a huge database (around 4 billion record - 30 TB) storing the
>>> video watch infromation ie view count , comment , favorited etc. I want
>>> to produce daily report for all videos view counts. It means I need to
>>> look 2 day , today and yesterday so subtract yesterdey view count from
>>> today view count so I can find the daliy impression. Our Fat DB team
>>> doing this a few complex queries. I would like to ask you is this
>>> possible with Riak map-reduce way .  I want to make a demonstration to
>>> the team to show this ..
>>>
>>> This is the scenario. We have similar data models for other thins. This
>>> could be a start.
>>>
>>> We have 30xHP DL380  x32 Gig Ram  Farm  to test this scenario.
>>>
>>> Any riak map-reduce experienced member can show some idea on this..  I
>>> guess.
>>>
>>> Regards
>>>
>>> Prometheus
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

-- 
Sent from my mobile device




More information about the riak-users mailing list