Inconsistent data when a node goes down

Alexander Sicular siculars at gmail.com
Mon Feb 28 15:48:10 EST 2011


I think it has to do with how the vnodes are partitioned against your
physical nodes. You really need a minimum of three physical nodes (or
virtual machines) to deploy and or do any failure testing.

-Alexander

On Mon, Feb 28, 2011 at 13:29, Luca Spiller <luca at stackednotion.com> wrote:
> Hi all,
>
> I've come across some issues while testing what happens when failures happen
> on our system, for example a machine failing. One of the (slightly scary)
> issues I have come across is for a short while when a Riak node goes down,
> data that is read from another node isn't always consistent. I have written
> a small test script to demonstrate this issue:
>
> https://gist.github.com/847749
> Halfway through I switch off a node; here are the results:
>
> Deleted 0
> Wrote 100 454551
> 1298916758: 100 454551
> 1298916759: 100 454551
> 1298916760: 100 454551
> 1298916761: 100 454551
> 1298916762: 100 454551
> 1298916762: 100 454551
> 1298916763: 100 454551
> 1298916764: 100 454551  (Shutdown around here)
> 1298916765: 100 454551
> 1298916766: 99 460532
> 1298916767: 91 412241
> 1298916768: 100 454551
> 1298916769: 100 454551
> 1298916770: 100 454551
> 1298916771: 100 454551
> 1298916772: 100 454551
> 1298916773: 100 454551
> 1298916774: 100 454551
> 1298916775: 100 454551
> 1298916776: 100 454551
> 1298916777: 100 454551
> ^C1298916777: 100 454551
> Deleted 100
>
> Slightly more scary is that it appears to sometimes read old (deleted) data:
>
> Deleted 0
> Wrote 100 495792
> 1298916784: 100 495792
> 1298916785: 100 495792
> 1298916786: 100 495792
> 1298916786: 100 495792  (Shutdown around here)
> 1298916787: 100 495792
> 1298916788: 100 487322
> 1298916789: 100 495792
> 1298916790: 100 495792
> 1298916791: 100 495792
> 1298916792: 100 495792
> 1298916793: 100 495792
> 1298916794: 100 495792
> 1298916795: 100 495792
> ^C1298916796: 100 495792
> 1298916797: 100 495792
> Deleted 100
>
> This is using the Ripple library (0.8.3) talking directly to the local node,
> however I believe the same problem is happening when using the Erlang PBC
> library. This problem seems to be exacerbated when there are larger amounts
> of data being stored in Riak, and the eventual consistency takes longer to
> occur.
> I am quite puzzled as to why this is happening, I could kind of understand
> if data went missing, but the eventual consistency is what puzzles me, I
> only have two nodes, so why does the data eventually sort itself out?
> Secondly why does this still happen even with W and DW set to 3
> (I originally had the script using the default values, but thought I would
> try this)?
> Both of the nodes are running Riak 0.14.0, here are the relavent configs:
> https://gist.github.com/847756
> Apologies if I am just doing something stupid, it has been a rather long day
> :)
> Regards,
> Luca Spiller
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>




More information about the riak-users mailing list