Inconsistent data when a node goes down

Dan Reverri dan at basho.com
Mon Feb 28 16:38:01 EST 2011


Hi Luca,

For any request Riak will ask 3 vnodes for an object's value (assuming N=3).
If a majority of vnodes return "not found", Riak will return "not found" to
the client. When a node is taken down other nodes can act as a fallback for
that node. When a fallback node takes over for a primary node it will not
immediately be aware of the objects for which the primary node is
responsible. This means that the fallback node can return not found for some
objects. Read repair will eventually populate the fallback node with any
missing data:
https://wiki.basho.com/Riak-Glossary.html#Read-Repair

In your situation, the majority of vnodes responding for the "not found"
objects are fallback vnodes that are not yet aware of the objects for which
it is responsible. After your script reads the "not found" objects the read
repair mechanism populates the fallback vnodes. Subsequent reads on the
fallback vnodes will return the object. Also note that this behavior is
observed even with R=1 because of a process known as basic quorum. Basic
quorum assumes that if a majority of nodes returns "not found" it is likely
the object does not exist and will not wait for the third vnode to respond.
Bug 992 has been opened to investigate improving this behavior:
https://issues.basho.com/show_bug.cgi?id=992

Let me know if that makes sense.

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
dan at basho.com


On Mon, Feb 28, 2011 at 12:48 PM, Alexander Sicular <siculars at gmail.com>wrote:

> I think it has to do with how the vnodes are partitioned against your
> physical nodes. You really need a minimum of three physical nodes (or
> virtual machines) to deploy and or do any failure testing.
>
> -Alexander
>
> On Mon, Feb 28, 2011 at 13:29, Luca Spiller <luca at stackednotion.com>
> wrote:
> > Hi all,
> >
> > I've come across some issues while testing what happens when failures
> happen
> > on our system, for example a machine failing. One of the (slightly scary)
> > issues I have come across is for a short while when a Riak node goes
> down,
> > data that is read from another node isn't always consistent. I have
> written
> > a small test script to demonstrate this issue:
> >
> > https://gist.github.com/847749
> > Halfway through I switch off a node; here are the results:
> >
> > Deleted 0
> > Wrote 100 454551
> > 1298916758: 100 454551
> > 1298916759: 100 454551
> > 1298916760: 100 454551
> > 1298916761: 100 454551
> > 1298916762: 100 454551
> > 1298916762: 100 454551
> > 1298916763: 100 454551
> > 1298916764: 100 454551  (Shutdown around here)
> > 1298916765: 100 454551
> > 1298916766: 99 460532
> > 1298916767: 91 412241
> > 1298916768: 100 454551
> > 1298916769: 100 454551
> > 1298916770: 100 454551
> > 1298916771: 100 454551
> > 1298916772: 100 454551
> > 1298916773: 100 454551
> > 1298916774: 100 454551
> > 1298916775: 100 454551
> > 1298916776: 100 454551
> > 1298916777: 100 454551
> > ^C1298916777: 100 454551
> > Deleted 100
> >
> > Slightly more scary is that it appears to sometimes read old (deleted)
> data:
> >
> > Deleted 0
> > Wrote 100 495792
> > 1298916784: 100 495792
> > 1298916785: 100 495792
> > 1298916786: 100 495792
> > 1298916786: 100 495792  (Shutdown around here)
> > 1298916787: 100 495792
> > 1298916788: 100 487322
> > 1298916789: 100 495792
> > 1298916790: 100 495792
> > 1298916791: 100 495792
> > 1298916792: 100 495792
> > 1298916793: 100 495792
> > 1298916794: 100 495792
> > 1298916795: 100 495792
> > ^C1298916796: 100 495792
> > 1298916797: 100 495792
> > Deleted 100
> >
> > This is using the Ripple library (0.8.3) talking directly to the local
> node,
> > however I believe the same problem is happening when using the Erlang PBC
> > library. This problem seems to be exacerbated when there are larger
> amounts
> > of data being stored in Riak, and the eventual consistency takes longer
> to
> > occur.
> > I am quite puzzled as to why this is happening, I could kind of
> understand
> > if data went missing, but the eventual consistency is what puzzles me, I
> > only have two nodes, so why does the data eventually sort itself out?
> > Secondly why does this still happen even with W and DW set to 3
> > (I originally had the script using the default values, but thought I
> would
> > try this)?
> > Both of the nodes are running Riak 0.14.0, here are the relavent configs:
> > https://gist.github.com/847756
> > Apologies if I am just doing something stupid, it has been a rather long
> day
> > :)
> > Regards,
> > Luca Spiller
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> >
> >
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110228/fdcb27a6/attachment.html>


More information about the riak-users mailing list