MapReduce filtering question

Sean Cribbs sean at
Sat Nov 20 08:38:34 EST 2010

On Nov 19, 2010, at 11:54 PM, Parker Thompson wrote:

> Thanks, a few questions inline...

Responses inline.

> On Fri, Nov 19, 2010 at 2:43 PM, Sean Cribbs <sean at> wrote:
> class Riak::Alternative
>  include Ripple::Document
>  many :visitors, :class_name => "Riak::Visitor"
>  property :alternative_id, Integer, :presence => true
>  key_on :alternative_id
> end
> If I expect to be writing large numbers of visitor->alternatives links is it performant to be writing them all as links on one object, as opposed to creating many experience docs each with a link ?  Naïvely I would assume this might less evenly distribute write load or degrade as the size of the Link data grows.  Does this matter?

You either take the hit at write time or at query time. Personally, I think this is easier to understand and manage, but it may also depend on what other queries you want to run.  For example, if your other queries are more focused on the individual visitor, it might make sense to have a list of alternative_ids in the Visitor class, instead of having something (experience or alternative) link to it.  The key takeaway I'm trying to lead you to is that data types that are effectively "joins" are less efficient/useful in Riak.

> ########
> def visitors_who_shared
>    add("riak_alternatives", ar_id.to_s).
>    link(:bucket => 'riak_visitors').
>    map(link_to_events_forward_visitor).
>    map(map_share_events_to_visitor).
>    reduce(["riak_kv_mapreduce", "reduce_set_union"]).
>    map(map_identity, :keep => true).
>    run
> end
> Ah, I was looking for a set_union.  Is there a full list of these functions hiding somewhere?

Our community manager has some better documentation in the oven, but there are a number of them in these two files:



Sean Cribbs <sean at>
Developer Advocate
Basho Technologies, Inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list