The suitability of MapReduce

Antonio Rohman Fernandez rohman at mahalostudio.com
Tue Apr 9 05:42:06 EDT 2013


 

But... then... i wonder how to do the following task, as i assumed
MR would be the right thing to do: 

- Imagine Facebook's "news feed",
that every little time recompile the statuses, photos, comments, likes,
etc... of all your contacts. 

Shouldn't this be done by MR? and if
so... shouldn't the user be able to execute it by-demand if they want to
refresh the news feed? ( or at least refreshed in the background every X
minutes ) and the user able to GET the refreshed compiled data?


Merci,
Rohman 

On 09.04.2013 01:26, Matt Black wrote: 

> I think an
short and explicit discussion of using sequential GETs would be good to
add to the docs in [1]. It'll be helpful to put the alternate option in
the reader's head so they can evaluate as they're going through the
article. 
> 
> Cheers 
> Matt
> 
> On 9 April 2013 02:02, Jeremiah
Peschka <jeremiah.peschka at gmail.com> wrote:
> 
>> I want to follow up on
the recent "Map phase timeout" thread [2]. In part out of curiosity and
in part as a documentation clean up... Should the documentation at [1]
be changed? Specifically, the docs say MR should be used: 
>> 
>> * WHEN
YOU KNOW THE SET OF OBJECTS YOU WANT TO MAPREDUCE OVER (THE BUCKET-KEY
PAIRS) (emphasis added)
>> * When you want to return actual objects or
pieces of the object - not just the keys, as do Search & Secondary
Indexes
>> * When you need utmost flexibility in querying your data.
MapReduce gives you full access to your object and lets you pick it
apart any way you want.
>> 
>> It seems to me that a lot of discussions
around MR in Riak come down to "You're close but this isn't the best use
case of MapReduce in Riak." Would it be better, for the purposes of a
general discussion, to say that MapReduce is the appropriate paradigm
when you want to: 
>> 
>> * manipulate a large amount of data inside the
Riak cluster in bulk - e.g. read all of my sales orders and where the
version is 1, perform the changes necessary to update the order format
to version 2.
>> * burn a lot of I/O and make your admin sad
>> * move
data from one bucket to another
>> * re-write an entire bucket so all
data is indexed for 2i, search, etc
>> * Anything where the query can be
resumed with no knowledge of state at the time the last run of the query
failed.
>> 
>> Are there other use cases when MR is the better approach?

>> 
>> [1]:
http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/#When-to-Use-MapReduce
[1] 
>> [2]:
http://riak.markmail.org/search/?q=#query:+page:1+mid:4o27v64qf55ejzwc+state:results
[2] 
>> 
>> --- 
>> Jeremiah Peschka - Founder, Brent Ozar Unlimited

>> MCITP: SQL Server 2008, MVP 
>> Cloudera Certified Developer for
Apache Hadoop 
>> _______________________________________________
>>
riak-users mailing list
>> riak-users at lists.basho.com
>>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com [3]
>

> _______________________________________________
> riak-users mailing
list
> riak-users at lists.basho.com
>
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[3]

-- 

 		 [4]

 ANTONIO ROHMAN FERNANDEZ
CEO, Founder & Lead
Engineer
rohman at mahalostudio.com 		 
 PROJECTS
MaruBatsu.es
[5]
PupCloud.com [6]
Wedding Album [7] 

 

Links:
------
[1]
http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/#When-to-Use-MapReduce
[2]
http://riak.markmail.org/search/?q=#query:+page:1+mid:4o27v64qf55ejzwc+state:results
[3]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
[4]
http://mahalostudio.com
[5] http://marubatsu.es
[6]
http://pupcloud.com
[7] http://wedding.mahalostudio.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130409/8a2b1448/attachment.html>


More information about the riak-users mailing list