Riak replication and quorum
mathias at basho.com
Thu May 26 05:15:52 EDT 2011
wrote my replies inline.
Developer Advocate, Basho Technologies
On Freitag, 13. Mai 2011 at 20:05, Peter Fales wrote:
> Thanks to you and Ben for clarifying how that works. Since that was
> so helpful, I'll ask a followup question, and also a question on
> a mostly un-related topic...
> 1) When I've removed a couple of nodes and the remaining nodes pick up
> the slack, is there any way for me to look under the hood and see that?
> I'm using wget to fetch the '.../stats' URL from one of the remaing
> live nodes, and under ring_ownership it still lists the original 4
> nodes, each one owning 1/4 or the total partitions. That's part of
> reason why I didn't think the data ownership had been moved.
Ring ownership is only affected by nodes explicitly entering and leaving the cluster. Unless you explicitly tell the cluster to remove a node, or explicitly tell that node to leave the cluster, ownership will remain the same even in case of a failure on one or more nodes. Data ownership is moved around implicitly in case of failure. By looking at the preference list, the coordinating node simply picks the next node(s) to pick up the slack for the failed one(s).
The only way to find out if a handoff is currently happening between any two nodes is to look at the logs. They'll indicate beginning and end of a transfer. The cluster state and therefore the stats don't take re-partitioning or handoff into account yet.
> 2) My test involves sending a large number of read/write requests to the
> cluster from multiple client connections and timing how long each request
> takes. I find that the vast majority of the requests are processed
> quickly (a few milliseconds to 10s of milliseconds). However, every once
> in while, the server seems to "hang" for a while. When that happens
> the response can take several hundred milliseconds or even several
> seconds. Is this something that is known and/or expected? There
> doesn't seem to be any pattern to how often it happens -- typically
> I'll see it a "few" times during a 10-minute test run. Sometimes
> it will go for several minutes without a problem. I haven't ruled
> out a problem with my test client, but it's fairly simple-minded C++
> program using the protocol buffers interface, so I don't think there
> is too much that can go wrong on that end.
Easiest to find out if the problem is something stalling is to look at the stats and the percentiles for put and get fsms, which are responsible for taking care of reads and writes. Look for the JSON keys node_get_fsm_time_* and node_put_fsm_time_*. If anything jumps out here during and shortly after your benchmark run, something on the Riak or EC2 end is probably waiting for something else.
Are you using EBS in any way for storing Riak's data? If so, what kind of setup do you have, single volume or RAID?
More information about the riak-users