cluster health check using riak-java-client

David Byron dbyron at dbyron.com
Thu Dec 3 01:25:08 EST 2015


I'm implementing a health check for a service of mine that uses riak. 
I've seen this code from 
https://github.com/basho/riak-java-client/issues/456:

RiakCluster cluster = clientInstance.getRiakCluster();
List<RiakNode> nodes = cluster.getNodes();
for (RiakNode node : nodes)
{
   State state = node.getNodeState();
}

and it's great.  From what I can tell, it depends on some background 
processing that keeps track of the state of the nodes.  I did a quick 
test though, and if I run 'riak stop' from the command line and then 
this loop with no intervening operations, the nodes report RUNNING. 
Even after some time passes (more than three minutes), still RUNNING.

However, if I run do run an intervening operation (some actual query of 
data for example) that fails, the nodes then report HEALTH_CHECKING. 
Then, after 'riak start', the nodes report RUNNING again.  I suppose 
that's good.

So, I'm trying to decide how to implement the health check.  The above 
loop doesn't seem to be enough, but do I really need to do something like:

final RiakFuture<Void, Void> future = cluster.execute(new PingOperation());

try {
   future.await();
   future.get();
} catch (ExecutionException | InterruptedException e) {
   // bad
}
// good

Maybe it's sufficient to only do this if all the nodes report RUNNING? 
I suppose there's always a small window in time where a node could 
report bad, but via a ping I'd learn it was up...so I'm torn.  Any 
suggestions for whether pinging every time is correct, or there's 
something more efficient (and safe)?

Thanks for your help.

-DB




More information about the riak-users mailing list