riak_core application design questions

Bryan Fink bryan at basho.com
Tue Oct 2 12:51:15 EDT 2012

On Wed, Sep 26, 2012 at 4:38 PM, Chris Hicks
<silent_vendetta at hotmail.com> wrote:
> However, sending
> the command off to, say, 3 nodes to process means that each of my nodes will
> then ping Riak (if I need 2 responses per query suddenly that means I hit
> Riak nodes a total of 6 times), process, and then all try to save the data
> back to Riak (again possibly hitting 6+ nodes to save the data). Is that
> wasteful?

Hi, Chris. Yes, there is waste here. But, as many implementers in this
space have pointed out[1], redundant work can make distributed queries
faster (as long as you have the excess processing power available).

[1] http://www.bailis.org/blog/doing-redundant-work-to-speed-up-distributed-queries/

In fact, this is part of the reason that Riak hits N nodes for each
read/write: it's a bet that R of them will reply faster than the rest,
and thus you don't have to wait on the slowest.

> Would it better, if it is even possible, to have one node doing the
> computation with another node holding a copy of the command as a fail-over?
> Once the 'primary' computational node finishes it's work and the data is
> saved the secondary one could be informed and just drop the command since it
> isn't necessary anymore. I'm still trying to grok all the implications of
> the way riak_core does things so any advice would be greatly appreciated.

This is still "wasteful" in the sense that you have to tell a node
some information that, in the happy-path case, it's never going to
care about. The hard part here will be deciding at what point the
"first" node failed to complete the computation, and failover to the
next. Waiting to little time means you may have the same waste as your
earlier setup, but waiting too much time means your latency will

However, this is still a common setup. Some dynamo implementations
even do a similar thing where they first ask the node they expect to
reply the fastest, and then only ask other nodes if that first one
does not reply fast enough. It's all in the accuracy of the modeling
and prediction at that point, as to whether this makes things faster
or slower.

Riak Pipe, Riak's internal distributed processing system for it's
scatter-gather query system, takes this approach as well. It sends an
input to the first node of a preflist, and then falls back to others
if needed. (Though its current use case also gets to assume that the
node that started the query stays up for the entire duration of the
query, so there is a point that can tell when an input has been lost.)


More information about the riak-users mailing list