AAE problems

Ryan Zezeski rzezeski at basho.com
Thu Jun 19 10:10:03 EDT 2014

On Tue, Jun 17, 2014 at 12:46 PM, István <leccine at gmail.com> wrote:
> The entire dataset is idempotent and immutable, so there is not even a
> slightest chance that we are ending up with different values on
> different nodes for the same key in the same bucket. It seems that
> anti-entropy still finds problems:
> /var/log/riak/console.log.4:2014-06-11 06:11:41.756 [info]
> <0.6776.6003>@riak_kv_exchange_fsm:key_exchange:206 Repaired 1 keys
> during active anti-entropy exchange of
> {536645132457440915277915524513010171279912730624,3} between
> {548063113999088594326381812268606132370974703616,'riak at'}
> and {559481095540736273374848100024202093462036676608,'riak at'}

AAE exchange uses snapshots of the trees.  The snapshots on each node will
happen concurrently.  If your cluster is servicing writes as these
snapshots are made then there is a chance a snapshot will be made on one
node containing keys X,Y,Z and on the other node which has only seen keys X
& Y.

> My question would be:
> Is there any reason to let AAE running if we don't mutate the data in
> place?


Immutable data provides nice semantics for your application but does
_nothing_ to save you from the whims of the stack your application runs on.
 Operating systems, file systems, and hardware all have subtle ways to
corrupt your data both on disk and in memory.  Immutable data also doesn't
help in the more practical case where the network decides to drop packets
and a write only makes it to some of the nodes.

> Is there any way knowing what is causing the difference according to
> AAE between two nodes?

There is but it requires attaching to Riak and running some diagnostic
commands _when_ a repair takes place.  I'm not sure it will give you any
insight though.  It will either say: 1) remote missing, 2) local missing or
3) hashes are different.

> I was thinking about how this could potentially
> happen and I am wondering if the Java client pb interface supports R
> and W values, so I could make sure that a write goes in with W=(the
> number of nodes we have).

Doubt this will help with the concurrency problem I discussed above but it
will mean your application has a stronger guarantee of how many copies made
it to the nodes.  If you want to make sure they are durable then I would
use DW if Java exposes it [1].

[1]: See the "optional query parameters" for difference between W, DW, and

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140619/532c6517/attachment.html>

More information about the riak-users mailing list