'not found' after join

Nico Meyer nico.meyer at adition.com
Tue May 3 05:30:06 EDT 2011

Hi everyone,

I just want to note that I observed similar behaviour with a somewhat
larger clusters of 10 or so nodes. I first noticed that handoff activity
after node join (or leave for that matter) involved a lot more
partitions than I would have expected. By comparing the old and the new
ring file, I found out that more than 80 percent of partitions had to be
moved to another node.
My naive expectation was that joining a node to a cluster of size X
would result in roughly ring_creation_size/(X+1) partitions to be handed
off, which would also be the minimum if one expects a balanced cluster
Furthermore it would in theory be possible to move partitions in such a
way that at least one partition from each preflist stays on the same
node. Maybe for X>N it should even be possible to guarantee this for a
basic quorum of each preflist, eliminating the notfound problem
completely, but I am not sure about that.

I may be able to provide some ring files to analyze this behaviour if
someone from basho is interested.

Cheer Nico

Am Montag, den 02.05.2011, 23:14 -0400 schrieb Ryan Zezeski:
> Greg,
> Your expectations are fair, just because you added a node doesn't mean
> Riak should return notfounds.  Unfortunately, we aren't quite there
> yet.  This is a side effect of how Riak currently implements handoff
> in that it immediately updates/gossips the ring causing
> many partitions to handoff immediately.  If a request comes in that
> relies on these partitions then it will get a notfound and perform
> read repair.  You're situation is multiplied by the fact that you are
> going from 3 nodes to 4.  More vnode shuffling occurs because of the
> small cluster size.
> We're well aware of this and have it on our radar for improvement in a
> future release.
> All this said, you data will be eventually consistent.  That is, all
> your data will eventually be handed off and things will work as
> normal.  It's only during the handoff that you _may_ encounter
> notfounds.  In this case it would be best to add a new node to your
> cluster at lowest load times and if you can spare additional hardware
> a few more nodes to start with is an even easier option.
> -Ryan
> On Mon, May 2, 2011 at 9:48 PM, Greg Nelson <grourk at dropcam.com>
> wrote:
>         Hello riak users! 
>         I have a 4 node cluster that started out as 3 nodes.
>          ring_creation_size = 2048, target_n_val is default (4), and
>         all buckets have n_val = 3.
>         When I joined the 4th node, for a few minutes some GETs were
>         returning 'not found' for data that was already in riak.
>          Eventually the data was returned, due to read repair I would
>         assume.  Is this expected?  It seems that 'not found' and read
>         repairs should only happen when something goes wrong, like a
>         node goes down.  Not when adding a node to the cluster, which
>         is supposed to be part of normal operation!
>         Any help or insight is appreciated!
>         Greg
>         _______________________________________________
>         riak-users mailing list
>         riak-users at lists.basho.com
>         http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

More information about the riak-users mailing list