Problem with Vector Clocks - inconsistencies encountered in cluster with shifted real local clocks

Russell Brown russell.brown at me.com
Thu Oct 1 08:18:58 EDT 2015


I need more time to examine the diagram, but this all looks as expected so far.

If a client sends no context then it’s write will be a sibling of whatever is stored at the coordinator, as you rightly point out riak treats an incoming clock that is less than a local clock as a sibling.
If the coordinator is configured to not store siblings then the sibling value with the highest timestamp is stored, I recommend you run riak in either allow_mult=true or LWW=true, allow_mult=false, in my view, should not be default.
If two riak nodes do the above, and then replicate their values, the single value with the highest value is stored. Isn’t this what you are seeing? If you depend on time to pick the latest, and nodes’ clocks are out of sync this is the price.

Is this what you are seeing? Are you seeing results you didn’t expect, or non-deterministic results? Or both?

Regards

Russell

On 1 Oct 2015, at 12:58, Zuzana Zatrochova <zatrochova at gmail.com> wrote:

> Hi,
> 
> 
> 
> We are researching the client-centric consistency features of Riak database. We encountered a problem with vector clocks implementation. The vector clocks do not seem to work locally on a machine as expected. We would like you to confirm if the behavior is desired. First I will describe the environment of our experiments and then the problem will be presented.
> 
> 
> 
> Environment:
> 
> 
> 	• Our environment consists of six virtual machines
> 		• five machines in Riak cluster, each represent a single Riak node with Riak database
> 		• one machine with java application that simulates multiple clients communicating with Riak database
> 	• Machines are Virtualized VMs by VMware software and have slightly shifted time to each other (no more than 1 second)
> 	• We made experiments with versions riak-1.4.8 and riak-2.1.1. In riak-1.4.8 app_config contains vnode_vclocks = true  (default setting that was there when downloaded) in riak-2.1.1 we could not locate configuration for vnode vclocks either in advanced configurations in documentation or riak.conf so we assumed it also defaults to true and is no longer enabled to change
> 	• For each experiment we have 500 clients concurrently sending requests to random node from the cluster. There are 20000 requests per minute operating only on 20 different keys (load on single key is 16 requests per second (read:write ration = 50:50).
> 	• For referenced issue we used quorums R = 1, W = 3; R = 2, W = 2 and R =3 W = 1
> 	• All riak settings are default apart from IP settings and quorum settings. We added interceptors from riak_test module that don’t change the code and are implemented only for logging purposes (information about states of nodes), error.log is empty
> 
> Problem:
> 
> 
> 	• It seems that Riak does not use vector clocks locally, only on global scale. When a data object is created on client side and sent to Riak database it does not have any vector clocks assigned (more precisely the function riak_object:vclock(UpdObj) = [] and local object: riak_object:vclock(LocalObj) returns the local VC for the local object. Therefore the function (in 2.1.1 but similar behavior is in 1.4.8) vclock:descends(NewObject, LocalObject) returns false for all my experiments with different quorums (Empty vector clocks cannot descend non empty vector clocks). The behavior leads to merge of contents = creation of siblings (or resolving the value according to the timestamp not vector clocks when siblings are not allowed – our configuration)
> 	• In our experiments when time on VMs is not synchronized up to 500 milliseconds the situation from picture issue.png sent in attachment arises. Due to the fact that two objects with the same key are sent to two different coordinators and coordinators clocks are shifted the later object is assigned earlier timestamp as the object that was sent before. As the result of the vector clocks implementation in Riak, the later object is lost due to the merge of contents where later timestamp (wrong because of local clock shift) is evaluated as the latest.
> 
> The question:
> 
> 
> 
> Is this the Riak intended behavior? The problem is that even when quorum is set to prefer consistency and there are no partitions in the cluster there are still inconsistent requests seen from client perspective = any read must return the value of the latest finished write or later unfinished write request. (We did not use the strong_consistency feature of riak-2.1.1 version).
> 
> 
> 
> Thank you,
> 
> Zuzana
> 
> <issue.png>_______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list