Achieving 100% consitency

Jeremiah Peschka jeremiah.peschka at gmail.com
Mon Aug 29 09:35:15 EDT 2011


When you write the data you can set w and dw values to the number of replicas (n) in your cluster. That will ensure that every node in Riak has the same data. 

Riak should be using a quorum read as well. Using a simple example of a three node cluster...

You write a chunk of data about a user, we'll call him 'lukas'. From your code, you perform PUT('lukas', some_value). That hits Node1 of your cluster. Node1 says "Ah, here is some data. I must share it with my friends!" Node1 immediately tells Node2 and Node3 about the interesting things that 'lukas' is doing. Node3 is already doing something else involving MapReduce, so it isn't able to respond immediately. Node2, however, is doing nothing and it immediately responds. Since we're using the defaults, (n/2) + 1 nodes have been able to successfully respond to the write request, Node1 sends a message back to your client saying "OK! We got the data. KTHXBAI!"

Assuming that your client does some kind of load balancing, you try to read some data through GET('lukas'). Your request goes to Node3. Node3, for whatever reason, doesn't have the right data. It has an older version of 'lukas'. But, Node3 won't send the data back just yet with the default read/write consistency values. Node3 asks everyone else for the data. Node1 sends back its copy of 'lukas' first. Node3 takes one look at the information and says, "Something's not right here, Node1 thinks 'lukas' looks like this and I think 'lukas' looks like that. I'd better wait for Node2." Node2 sends its copy of the data over. Node3 looks at Node2's data and says "Node1 and Node2 agree. Clearly I've missed something." At this point, Node3 will correct its local data in a read repair operation and send the data back to the client.

In short, despite the vagaries of eventual consistency, consistent data eventually emerges from the system. 

There are a few other things to keep in mind.

1) When you issue a write to Riak, you can using a boolean property 'return-body' to force Riak to give you the most current version of the record being saved to disk. Everything will come back - data, vector clocks, etags, everything. 
2) Clients (and proxies) may cache data in memory in an attempt to avoid server load. If you want up to the minute data, you need to make sure that your cache is used with some kind of mechanism that keeps the cache up to date as writes occur (some kind of write-through/write-behind/cache-aside cache).
---
Jeremiah Peschka - Founder, Brent Ozar PLF, LLC
Microsoft SQL Server MVP

On Aug 29, 2011, at 12:20 AM, Lukas Schulze wrote:

> Hi,
> 
> thank you for your answers.
> I know that Riak is designed for running on distributed servers.
> But what's about adding lots of data and every tuple depends on another one?
> I thought that having only 1 node and disabling replications could solve my problems of getting always the latest data from Riak.
> 
> Is there another way to achieve 100% consistency in a riak database after a very short time?
> 
> Best regards
> Lukas
> 
> 
> 
> On Sat, Aug 27, 2011 at 5:43 PM, Ian Plosker <ian at basho.com> wrote:
> Jonathan,
> 
> Excuse me, that last message should have been addressed to you.
> 
> Ian Plosker
> Developer Advocate
> Basho Technologies
> 
> 
> On Aug 27, 2011, at 11:39 AM, Ian Plosker wrote:
> 
>> Lukas,
>> 
>> Yes, even for dev you'd be best advised to develop and test your application with the same or similar number of nodes and n, r, and w settings as you would in production. It's good practice to develop applications in a dev/test environment that mirrors the production environment as much as is reasonable/feasible. You can run a single node cluster, but note that this isn't a configuration you'll see in a production.
>> 
>> Ian Plosker
>> Developer Advocate
>> Basho Technologies
>> 
>> 
>> 
>> On Aug 27, 2011, at 5:33 AM, Jonathan Langevin wrote:
>> 
>>> Even for development-purposes only? Otherwise it seems data would be written n times to the same machine, which is needless in a dev environment with low storage specs...
>>> 
>>> 
>>> Jonathan Langevin
>>> Systems Administrator
>>> Loom Inc.
>>> Wilmington, NC: (910) 241-0433 - jlangevin at loomlearning.com - www.loomlearning.com - Skype: intel352
>>> 
>>> 
>>> 
>>> On Fri, Aug 26, 2011 at 5:01 PM, Ian Plosker <ian at basho.com> wrote:
>>> Lukas,
>>> 
>>> Also, we don't advise that you run single node clusters. Riak is designed to be used in clusters of at least 3 nodes. You can run a multi-node cluster on a single development machine by downloading the Riak source, and running "make devrel". Take a look at the Riak Fast Track (http://wiki.basho.com/The-Riak-Fast-Track.html) for more details.
>>> 
>>> Ian Plosker
>>> Developer Advocate
>>> Basho Technologies
>>> 
>>> On Aug 26, 2011, at 3:17 PM, Lukas Schulze wrote:
>>> 
>>>> I'm doing some simple tests with Riak and tried to build something like an index.
>>>> Therefore I created new buckets for some attributes like "name", "street" and "city".
>>>> One entry in the index-bucket "name" is for example "Mueller" and the value contains all user ids, formatted as an JSON string: "{id:[1,5,8,13,2,7]}"
>>>> The java objects are saved as JSON strings in a separate bucket "users", the keys in this bucket are the user-ids, the values are the JSON strings.
>>>> 
>>>> If I add 200 users via Java and the RiakPBC client every loop I fetch the index, add the new user id and store it again in Riak.
>>>> But java is too fast, so I receive an old version of the bucket.
>>>> 
>>>> Because I've only one node I set the n-value to 1, r = 1, w = 1 and dw = 1.
>>>> But I have to wait nearly 2 seconds to be mostly sure to get the correct response. (the computer isn't an high-end machine ;-) )
>>>> 
>>>> Is it possible to be sure that the data will be saved permanently and I can continue adding users?
>>>> Are there any caching methods I can configure?
>>>> Can I set the default n-value to 1 so that every newly created bucket will have this value?
>>>> Does Riak have any kind of indexes or is it possible to implement it a better way?
>>>> 
>>>> In my first version I saved all users in one bucket and iterated over all of them to find the correct one. But for every single request from the Java Service to Riak it took nearly 200ms. For a huge amount of entries (10,000) this isn't practible. Therefore I tried to implement my own indexes.
>>>> 
>>>> The main focus of my question is getting rid of the inconsistent reads.
>>>> 
>>>> Thank you.
>>>> 
>>>> Best Regards
>>>> Lukas
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list