The suitability of MapReduce

Guido Medina guido.medina at temetra.com
Tue Apr 9 05:59:23 EDT 2013


Rohman,

   It is more complicated than that, most big data systems use more than 
one DB engine (Including Facebook that uses like 5 different engines), 
for example (And we are not as big as Facebook), we use a relational 
SQL, a text search engine and Riak, you will have to balance each 
weakness with a different tool, and use each tool at what it does best, 
in the case of Riak:

  * JSON storage where you know your keys (And is easy for you to fetch
    keys concurrently)
  * If you need to "reduce", lets say, out of a million keys find 100,
    then "programatically" reduce that 100 to 25, you can enable 2i.
  * If you need a sophisticated search, you could hook into Yokozuma
    which uses Solr (We use Solr separately)

I would say there is no ideal solution, you use the best of it and 
counter the worst with something else.

Hope that helps,

Guido.

On 09/04/13 10:42, Antonio Rohman Fernandez wrote:
>
> But... then... i wonder how to do the following task, as i assumed MR 
> would be the right thing to do:
>
> - Imagine Facebook's "news feed", that every little time recompile the 
> statuses, photos, comments, likes, etc... of all your contacts.
>
> Shouldn't this be done by MR? and if so... shouldn't the user be able 
> to execute it by-demand if they want to refresh the news feed? ( or at 
> least refreshed in the background every X minutes ) and the user able 
> to GET the refreshed compiled data?
>
> Merci,
> Rohman
>
> On 09.04.2013 01:26, Matt Black wrote:
>
>> I think an short and explicit discussion of using sequential GETs 
>> would be good to add to the docs in [1]. It'll be helpful to put the 
>> alternate option in the reader's head so they can evaluate as they're 
>> going through the article.
>> Cheers
>> Matt
>>
>>
>> On 9 April 2013 02:02, Jeremiah Peschka <jeremiah.peschka at gmail.com 
>> <mailto:jeremiah.peschka at gmail.com>> wrote:
>>
>>     I want to follow up on the recent "Map phase timeout" thread [2].
>>     In part out of curiosity and in part as a documentation clean
>>     up... Should the documentation at [1] be changed? Specifically,
>>     the docs say MR should be used:
>>
>>       * *When you know the set of objects you want to MapReduce over
>>         (the bucket-key pairs) *(emphasis added)
>>       * When you want to return actual objects or pieces of the
>>         object -- not just the keys, as do Search & Secondary Indexes
>>       * When you need utmost flexibility in querying your data.
>>         MapReduce gives you full access to your object and lets you
>>         pick it apart any way you want.
>>
>>     It seems to me that a lot of discussions around MR in Riak come
>>     down to "You're close but this isn't the best use case of
>>     MapReduce in Riak." Would it be better, for the purposes of a
>>     general discussion, to say that MapReduce is the appropriate
>>     paradigm when you want to:
>>
>>       * manipulate a large amount of data inside the Riak cluster in
>>         bulk - e.g. read all of my sales orders and where the version
>>         is 1, perform the changes necessary to update the order
>>         format to version 2.
>>       * burn a lot of I/O and make your admin sad
>>       * move data from one bucket to another
>>       * re-write an entire bucket so all data is indexed for 2i,
>>         search, etc
>>       * Anything where the query can be resumed with no knowledge of
>>         state at the time the last run of the query failed.
>>
>>     Are there other use cases when MR is the better approach?
>>     [1]:
>>     http://docs.basho.com/riak/latest/tutorials/querying/MapReduce/#When-to-Use-MapReduce
>>     [2]:
>>     http://riak.markmail.org/search/?q=#query:+page:1+mid:4o27v64qf55ejzwc+state:results
>>
>>     ---
>>     Jeremiah Peschka - Founder, Brent Ozar Unlimited
>>     MCITP: SQL Server 2008, MVP
>>     Cloudera Certified Developer for Apache Hadoop
>>
>>     _______________________________________________
>>     riak-users mailing list
>>     riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>>     http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com  <mailto:riak-users at lists.basho.com>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
> -- 
> line
> logo <http://mahalostudio.com> 		*Antonio Rohman Fernandez*
> CEO, Founder & Lead Engineer
> rohman at mahalostudio.com <mailto:rohman at mahalostudio.com> 		*Projects*
> MaruBatsu.es <http://marubatsu.es>
> PupCloud.com <http://pupcloud.com>
> Wedding Album <http://wedding.mahalostudio.com>
>
> line
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130409/4c7d4fc6/attachment.html>


More information about the riak-users mailing list