Easier to use Java Client

Kresten Krab Thorup krab at trifork.com
Tue Mar 29 05:02:50 EDT 2011


One thing, which is often missed by newcomers to Riak [I'm not saying you missed it], is the importance of managing client IDs, and passing the right vector clocks back to the server. 

 { Basho'ers ... please corret me if I'm wrong }

Kresten



So, Rule#1 (which has two clauses), which you can always revert to:

1.a / every client needs a clientID, which is distinct for that client.  Be sure to always pass it along in all calls (in Java that is done by calling setClientID on the RiakClient, at the HTTP-level, it is done by passing the X-Riak-ClientId HTTP header).

1.b / when you send an update (HTTP PUT or DELETE), always pass along the X-Riak-Vectorclock from a corresponding GET.  If you don't do this, your PUT is likely to go to /dev/null, because Riak thinks that it is a replay of an old request.

Until you're re really familiar with how Riak works, you should always do these two, or you will be severely burned when you realize that it doesn't behave as expected.  Believe me, I've been there.


1.a / Choosing a good client ID
========================

If you don't choose a client ID, Riak will do it for you ... BUT .. it will choose a new one for EVERY REQUEST.  This has many issues, so Riak should really require YOU to come up with one in stead; perhaps it will do so at some point in the future.

Riak has some special optimizations if your client ID is the Base64-encoding of a byte array of length 4.  So, a good, default way to choose a client id is thus:

	static SecureRandom rnd = new SecureRandom();
	
	static ThreadLocal<String> CLIENT_ID = new ThreadLocal<String>() {
		protected String initialValue() {
			return randomClientID();
		};
	};
	
	public static String getClientID() {
		return CLIENT_ID.get();
	}
	
	static private String randomClientID() {
		byte[] bytes = new byte[4];
		rnd.nextBytes(bytes);
		return Base64.encode(bytes);;
	}

This makes it so that each thread in your application is assigned a new random ClientID, which is often useful if your client is multi-threaded.

The above code is *alot* better than the default of having the server side choose a new client id for every request.

If you have some kind of logical unique, non-concurrent client concept in your system, that may be even better.  It could e.g. be the IMEI of your mobile phone, if your Riak client app is running on a Phone; or it could be a userid, if you are sure that only one user is accessing the system at a time.


1.b / Passing the VectorClock
=======================

Secondly, you need to make sure that you pass the vector clock. 

You should think of the vector clock as an opaque "optimistic concurrency token", that you receive when you do a GET, and have to pass in when you do a PUT ... and then you get a new "optimistic concurrency token", that you have to use henceforth.

Depending on the configuration of your buckets, using an old vector clock will simply cause the PUT request to be ignored (if allow_mult=false), or cause siblings to be created (if allow_mult=true).  This is where Riak is often "not what you expect", but there is a good reason for this behavior.

IT IS ABSOLUTELY PARAMOUNT TO UNDERSTAND THIS.



The above two things (1.a and 1.b) are so difficult to understand for newcomers, and a bit tricky to get right, so IMHO a new Java client should provide some way to avoid doing these mistakes as the default behavior.

- So, it should choose a good client ID fo you if you don't.
- And it should make it so that you can't do UPDATE/PUT without having first GOT'en the riak object.  

The last part is especially tricky.  Perhaps we should have the API look like this to help that ....

  interface RiakObject {
     ...
  }

  interface UpdateableRiakObject extends RiakObject { ... }
  interface CreateableRiakObject extends RiakObject { ... }

  RiakClient {
      UpdateableRiakObject update(UpdateableRiakObject o) throws NotModified
      { ... send PUT ... }

      UpdateableRiakObject create(CreateableRiakObject o) throws AlreadyThere
      { ... send PUT ... }

      UpdateableRiakObject get(bucket, key);

      CreateableRiakObject fresh(bucket, key);
  }

I.e. NOT EXPOSE constructors for the implementors of RiakObject.  The only way to get an UpdateableRiakObject is to call RiakClient.get, or as the result of calling update/create; you can't just allocate one.  Also calling update/create should "invalidate" the original object so that it cannot accidentally be used again.  

I really think we need to have a way to enforce the linear nature of these things.  Otherwise people get fooled.



Kresten




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110329/20364a7b/attachment.html>


More information about the riak-users mailing list