The suitability of MapReduce

Jeremiah Peschka jeremiah.peschka at gmail.com
Tue Apr 9 09:10:55 EDT 2013


In this case, the argument would be that MR involves a list keys operation.
With a significantly large data set, this will take a long time. You could
potentially make it faster by bucketing updates into hour or minute time
boxes (e.g statuses-20130509-0600), but that's just my best guess.


On Tuesday, April 9, 2013, Antonio Rohman Fernandez wrote:

> **
>
> But... then... i wonder how to do the following task, as i assumed MR
> would be the right thing to do:
>
> - Imagine Facebook's "news feed", that every little time recompile the
> statuses, photos, comments, likes, etc... of all your contacts.
>
> Shouldn't this be done by MR? and if so... shouldn't the user be able to
> execute it by-demand if they want to refresh the news feed? ( or at least
> refreshed in the background every X minutes ) and the user able to GET the
> refreshed compiled data?
>
> Merci,
> Rohman
>
> On 09.04.2013 01:26, Matt Black wrote:
>
>  I think an short and explicit discussion of using sequential GETs would
> be good to add to the docs in [1]. It'll be helpful to put the alternate
> option in the reader's head so they can evaluate as they're going through
> the article.
>
> Cheers
> Matt
>
>
> On 9 April 2013 02:02, Jeremiah Peschka <jeremiah.peschka at gmail.com<javascript:_e({}, 'cvml', 'jeremiah.peschka at gmail.com');>
> > wrote:
>
>> I want to follow up on the recent "Map phase timeout" thread [2]. In part
>> out of curiosity and in part as a documentation clean up... Should the
>> documentation at [1] be changed? Specifically, the docs say MR should be
>> used:
>>
>>    - *When you know the set of objects you want to MapReduce over (the
>>    bucket-key pairs) *(emphasis added)
>>    - When you want to return actual objects or pieces of the object –
>>    not just the keys, as do Search & Secondary Indexes
>>    - When you need utmost flexibility in querying your data. MapReduce
>>    gives you full access to your object and lets you pick it apart any way you
>>    want.
>>
>> It seems to me that a lot of discussions around MR in Riak come down to
>> "You're close but this isn't the best use case of MapReduce in Riak." Would
>> it be better, for the purposes of a general discussion, to say that
>> MapReduce is the appropriate paradigm when you want to:
>>
>>    - manipulate a large amount of data inside the Riak cluster in bulk -
>>    e.g. read all of my sales orders and where the version is 1, perform the
>>    changes necessary to update the order format to version 2.
>>    - burn a lot of I/O and make your admin sad
>>    - move data from one bucket to another
>>    - re-write an entire bucket so all data is indexed for 2i, search, etc
>>    - Anything where the query can be resumed with no knowledge of state
>>    at the time the last run of the query failed.
>>
>> Are there other use cases when MR is the better approach?
>>
>> [1]:
>> http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/#When-to-Use-MapReduce
>> [2]:
>> http://riak.markmail.org/search/?q=#query:+page:1+mid:4o27v64qf55ejzwc+state:results
>>
>>   ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com <javascript:_e({}, 'cvml',
>> 'riak-users at lists.basho.com');>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
> _______________________________________________
> riak-users mailing listriak-users at lists.basho.com <javascript:_e({}, 'cvml', 'riak-users at lists.basho.com');>http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> --
> [image: line]  [image: logo] <http://mahalostudio.com>   *Antonio Rohman
> Fernandez*
> CEO, Founder & Lead Engineer
> rohman at mahalostudio.com <javascript:_e({}, 'cvml',
> 'rohman at mahalostudio.com');>   *Projects*
> MaruBatsu.es <http://marubatsu.es>
> PupCloud.com <http://pupcloud.com>
> Wedding Album <http://wedding.mahalostudio.com>  [image: line]
>


-- 
---
Jeremiah Peschka - Founder, Brent Ozar Unlimited
MCITP: SQL Server 2008, MVP
Cloudera Certified Developer for Apache Hadoop
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130409/a17f9694/attachment.html>


More information about the riak-users mailing list