The suitability of MapReduce

Dmitri Zagidulin dzagidulin at basho.com
Tue Apr 9 09:57:28 EDT 2013


(er, forgot to reply to the list instead of user)

Antonio,

Though the exact answer would depend on the implementation details, a
Facebook type "newsfeed" would best be implemented on Riak Search, not MR.
Take a look at this video:
http://vimeo.com/album/2258285/video/52417831(Building a Social
Application on Riak) to see how the Clipboard engineers
implemented their social app via Search. (The other link to look at is
http://blog.clipboard.com/2012/03/18/0-Milking-Performance-From-Riak-Search,
which they mention in the video).

In general, MapReduce is useful for cases where you have a *subset* of
known keys (as a result of another query, like 2i or Search, or even a
database external to Riak), and you want some sort of operation on those
keys (aggregation, counting, transformation).
But for actually *getting* those keys (to fill a social feed, for example),
MR is a bad use case.

Dmitri

On Tue, Apr 9, 2013 at 5:42 AM, Antonio Rohman Fernandez <
rohman at mahalostudio.com> wrote:

> **
>
> But... then... i wonder how to do the following task, as i assumed MR
> would be the right thing to do:
>
> - Imagine Facebook's "news feed", that every little time recompile the
> statuses, photos, comments, likes, etc... of all your contacts.
>
> Shouldn't this be done by MR? and if so... shouldn't the user be able to
> execute it by-demand if they want to refresh the news feed? ( or at least
> refreshed in the background every X minutes ) and the user able to GET the
> refreshed compiled data?
>
> Merci,
> Rohman
>
> On 09.04.2013 01:26, Matt Black wrote:
>
>  I think an short and explicit discussion of using sequential GETs would
> be good to add to the docs in [1]. It'll be helpful to put the alternate
> option in the reader's head so they can evaluate as they're going through
> the article.
>
> Cheers
> Matt
>
>
> On 9 April 2013 02:02, Jeremiah Peschka <jeremiah.peschka at gmail.com>wrote:
>
>> I want to follow up on the recent "Map phase timeout" thread [2]. In part
>> out of curiosity and in part as a documentation clean up... Should the
>> documentation at [1] be changed? Specifically, the docs say MR should be
>> used:
>>
>>    - *When you know the set of objects you want to MapReduce over (the
>>    bucket-key pairs) *(emphasis added)
>>    - When you want to return actual objects or pieces of the object –
>>    not just the keys, as do Search & Secondary Indexes
>>    - When you need utmost flexibility in querying your data. MapReduce
>>    gives you full access to your object and lets you pick it apart any way you
>>    want.
>>
>> It seems to me that a lot of discussions around MR in Riak come down to
>> "You're close but this isn't the best use case of MapReduce in Riak." Would
>> it be better, for the purposes of a general discussion, to say that
>> MapReduce is the appropriate paradigm when you want to:
>>
>>    - manipulate a large amount of data inside the Riak cluster in bulk -
>>    e.g. read all of my sales orders and where the version is 1, perform the
>>    changes necessary to update the order format to version 2.
>>    - burn a lot of I/O and make your admin sad
>>    - move data from one bucket to another
>>    - re-write an entire bucket so all data is indexed for 2i, search, etc
>>    - Anything where the query can be resumed with no knowledge of state
>>    at the time the last run of the query failed.
>>
>> Are there other use cases when MR is the better approach?
>>
>> [1]:
>> http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/#When-to-Use-MapReduce
>> [2]:
>> http://riak.markmail.org/search/?q=#query:+page:1+mid:4o27v64qf55ejzwc+state:results
>>
>>   ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
> _______________________________________________
> riak-users mailing listriak-users at lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> --
> [image: line]  [image: logo] <http://mahalostudio.com>   *Antonio Rohman
> Fernandez*
> CEO, Founder & Lead Engineer
> rohman at mahalostudio.com   *Projects*
> MaruBatsu.es <http://marubatsu.es>
> PupCloud.com <http://pupcloud.com>
> Wedding Album <http://wedding.mahalostudio.com>  [image: line]
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130409/47769613/attachment.html>


More information about the riak-users mailing list