The suitability of MapReduce

Dmitri Zagidulin dzagidulin at
Tue Apr 9 09:57:28 EDT 2013

(er, forgot to reply to the list instead of user)


Though the exact answer would depend on the implementation details, a
Facebook type "newsfeed" would best be implemented on Riak Search, not MR.
Take a look at this video: a Social
Application on Riak) to see how the Clipboard engineers
implemented their social app via Search. (The other link to look at is,
which they mention in the video).

In general, MapReduce is useful for cases where you have a *subset* of
known keys (as a result of another query, like 2i or Search, or even a
database external to Riak), and you want some sort of operation on those
keys (aggregation, counting, transformation).
But for actually *getting* those keys (to fill a social feed, for example),
MR is a bad use case.


On Tue, Apr 9, 2013 at 5:42 AM, Antonio Rohman Fernandez <
rohman at> wrote:

> **
> But... then... i wonder how to do the following task, as i assumed MR
> would be the right thing to do:
> - Imagine Facebook's "news feed", that every little time recompile the
> statuses, photos, comments, likes, etc... of all your contacts.
> Shouldn't this be done by MR? and if so... shouldn't the user be able to
> execute it by-demand if they want to refresh the news feed? ( or at least
> refreshed in the background every X minutes ) and the user able to GET the
> refreshed compiled data?
> Merci,
> Rohman
> On 09.04.2013 01:26, Matt Black wrote:
>  I think an short and explicit discussion of using sequential GETs would
> be good to add to the docs in [1]. It'll be helpful to put the alternate
> option in the reader's head so they can evaluate as they're going through
> the article.
> Cheers
> Matt
> On 9 April 2013 02:02, Jeremiah Peschka <jeremiah.peschka at>wrote:
>> I want to follow up on the recent "Map phase timeout" thread [2]. In part
>> out of curiosity and in part as a documentation clean up... Should the
>> documentation at [1] be changed? Specifically, the docs say MR should be
>> used:
>>    - *When you know the set of objects you want to MapReduce over (the
>>    bucket-key pairs) *(emphasis added)
>>    - When you want to return actual objects or pieces of the object –
>>    not just the keys, as do Search & Secondary Indexes
>>    - When you need utmost flexibility in querying your data. MapReduce
>>    gives you full access to your object and lets you pick it apart any way you
>>    want.
>> It seems to me that a lot of discussions around MR in Riak come down to
>> "You're close but this isn't the best use case of MapReduce in Riak." Would
>> it be better, for the purposes of a general discussion, to say that
>> MapReduce is the appropriate paradigm when you want to:
>>    - manipulate a large amount of data inside the Riak cluster in bulk -
>>    e.g. read all of my sales orders and where the version is 1, perform the
>>    changes necessary to update the order format to version 2.
>>    - burn a lot of I/O and make your admin sad
>>    - move data from one bucket to another
>>    - re-write an entire bucket so all data is indexed for 2i, search, etc
>>    - Anything where the query can be resumed with no knowledge of state
>>    at the time the last run of the query failed.
>> Are there other use cases when MR is the better approach?
>> [1]:
>> [2]:
>>   ---
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited
>> MCITP: SQL Server 2008, MVP
>> Cloudera Certified Developer for Apache Hadoop
>> _______________________________________________
>> riak-users mailing list
>> riak-users at
> _______________________________________________
> riak-users mailing listriak-users at lists.basho.com
> --
> [image: line]  [image: logo] <>   *Antonio Rohman
> Fernandez*
> CEO, Founder & Lead Engineer
> rohman at   *Projects*
> <>
> <>
> Wedding Album <>  [image: line]
> _______________________________________________
> riak-users mailing list
> riak-users at
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list