Simpler, many m/r phases, or complex, fewer m/r phases?

Samuel Elliott sam at lenary.co.uk
Thu Mar 15 05:00:37 EDT 2012


Have you had a closer look at riak_pipe? You might manage to write
something custom that only performs a read once, but that splits the work
into many smaller pieces. (riak_kv's M/R implementation lies atop riak_pipe
since 1.1 iirc)

https://github.com/basho/riak_pipe

Sam

On Thu, Mar 15, 2012 at 2:25 AM, Jonathan Langevin <
jlangevin at loomlearning.com> wrote:

> k, that's what I was afraid of (full value read from disk).
> This is for a client lib for Riak, so I'll reconfigure it to combine the
> logic down to single map/reduce phases. Luckily the logic in this case is
> mostly non-complex, while custom m/r functionality would be sent per phase
> specified by the application. The custom m/r phases would likely be more
> complex logic, so that pretty much works itself out.
>
> Thanks for the feedback!
>
> *
>
>  <http://www.loomlearning.com/>
>  Jonathan Langevin
> Manager, Information Technology
> Loom Inc.
> Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com -
> www.loomlearning.com - Skype: intel352
> *
>
>
> On Wed, Mar 14, 2012 at 10:02 PM, Alexander Sicular <siculars at gmail.com>wrote:
>
>> I would probably say complex/fewer MR phases but I guess it would depend
>> on the compute complexity of your functions (in order to take advantage of
>> parallelism/more compute cores). My reasoning is that every time you Map
>> you are reading the full value from disk. More Maps = more disk i/o. Not to
>> mention the erlang to js overhead if you are running js functions.
>>
>> Please report your findings!
>>
>> Best,
>>
>> -Alexander Sicular
>>
>> @siculars
>>
>> On Mar 14, 2012, at 6:16 PM, Jonathan Langevin wrote:
>>
>> What is better for performance in Riak?
>> *
>> More phases with simpler logic, or less phases with more complex logic?
>>
>> For instance, if I want to check 10 different fields of the result
>> objects, using 10 different functions, should I combine that all down into
>> 1-2 m/r phases, or run as 10 different m/r phases?
>>
>> I would think more phases would suggest that the workload could be
>> distributed across various nodes more easily, but few phases would mean
>> that the values wouldn't have to be processed as many times...
>>
>>  <http://www.loomlearning.com/>
>>  Jonathan Langevin
>> Manager, Information Technology
>> Loom Inc.
>> Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com -
>> www.loomlearning.com - Skype: intel352
>> *
>>  _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Samuel Elliott
sam at lenary.co.uk
http://lenary.co.uk/
+44 (0)7891 993 664
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120315/e3998231/attachment.html>


More information about the riak-users mailing list