Getting a value: get vs map

Sean Cribbs sean at basho.com
Fri Jul 29 13:46:36 EDT 2011


A few things that should be mentioned as well:

1) MapReduce amounts to N=1, or reading only one replica. If you have
divergent replicas (siblings, e.g.) on different notes, they might not
appear in your MapReduce results.
2) MapReduce does not invoke read-repair, so divergent replicas will not
converge.

On Fri, Jul 29, 2011 at 1:30 PM, Justin Sheehy <justin at basho.com> wrote:

> Jeremiah,
>
> You were essentially correct. A "targeted" MR does not have to search
> for the data, and does not slow down with database size. It is a
> bucket-sweeping MR that currently has that behavior.
>
> -Justin
>
>
>
> On Fri, Jul 29, 2011 at 10:27 AM, Jeremiah Peschka
> <jeremiah.peschka at gmail.com> wrote:
> > I would have suspected that an MR job where you supply a Bucket, Key pair
> would be just as fast as a Get request. Shows what I know.
> > ---
> > Jeremiah Peschka
> > Founder, Brent Ozar PLF, LLC
> >
> > On Jul 29, 2011, at 1:37 AM, Antonio Rohman Fernandez wrote:
> >
> >> MapReduce ( or a simply Map ) gets really slow when database has a
> significant amount of data ( or distributed over several servers ). Get
> instead is always faster as Riak doesn't have to search for the key ( you
> tell Riak exactly where to GET the data in your url )
> >>
> >> Rohman
> >>
> >> On Thu, 28 Jul 2011 23:43:06 +0400, mss at mawhrin.net wrote:
> >>
> >>> Hi,
> >>>
> >>> (I looked at various places for the information, however I could not
> >>> find anything that would answer the question.  It's not completely
> ruled
> >>> out that not all places were checked though :))
> >>>
> >>> I use PB erlang interface to access the database.  Given a bucket name
> >>> and a key, the value can easily be extracted using:
> >>>
> >>>     {ok, Object} = riakc_pb_socket:get(Conn, Bucket, Key),
> >>>     Value = riakc_obj:get_value(Object)
> >>>
> >>> Alternatively, a mapred (actually, just map) request could be issued:
> >>>
> >>>     {ok, [{_, Value}]} = riakc_pb_socket:mapred(Conn, [
> >>>         {Bucket, Key}
> >>>     ], [
> >>>         {map, {modfun, riak_kv, map_object_value}, none, true}
> >>>     ])
> >>>
> >>> I would expect that the result is the same while in the second case,
> the
> >>> amount of data transferred to the client is smaller (which might be
> good
> >>> for certain situations).
> >>>
> >>> So the [open] question is: are there any reasons for using the first
> >>> approach over the second?
> >>>
> >>> --
> >>> Misha
> >>>
> >> --
> >>
> >>               Antonio Rohman Fernandez
> >> CEO, Founder & Lead Engineer
> >> rohman at mahalostudio.com               Projects
> >> MaruBatsu.es
> >> PupCloud.com
> >> Wedding Album
> >> _______________________________________________
> >> riak-users mailing list
> >> riak-users at lists.basho.com
> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>



-- 
Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://www.basho.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110729/79ad36fe/attachment.html>


More information about the riak-users mailing list