Distributed reduce phases via pre-reduce [Was "Secondary Indexes - Feedback?"]
gtillman at mezeo.com
Fri Nov 18 11:36:00 EST 2011
Bryan thanks very much for the info. I am going to investigate this over the weekend. What I am trying to achieve is the ability to sort a potentially very large dataset in the most efficient way possible.
On Nov 18, 2011, at 10:12 , Bryan Fink wrote:
> On Thu, Nov 17, 2011 at 9:45 AM, Gordon Tillman <gtillman at mezeo.com> wrote:
>> I'm really interested in being able to implement distributed
>> reduce phases (specifically to do a partial sort) and then have that output
>> handle by a final reduce phase that could perform an efficient merge sort
>> and stream results back to the client. That would be really cool!
> Hi, Gordon. I just caught up on the 2i thread, and noticed your
> comment at the end. As of 1.0 and RiakPipe-based MapReduce, you are
> able to do something like this with the "pre-reduce" tuning
> With pre-reduce enabled, your reduce function is run in two stages.
> The first stage processes the outputs of the previous map stage in
> parallel, on the vnode where each output was produced. The second
> stage is reduce as you know it, processing all of the results of that
> pre-reduce stage in one place.
> For example, if you had these vnodes producing these outputs:
> A: 1,2,3
> B: 4,5,6
> C: 7,8,9
> enabling pre-reduce would cause three parallel reduces:
> x = reduce(1,2,3)
> y = reduce(4,5,6)
> z = reduce(7,8,9)
> followed by a final all-together reduce of those results:
> So, it's the same function evaluated in both stages, but if you've
> written it to play well with re-reduce, it should "just work".
> Hope that helps,
More information about the riak-users