configurable prefix for consistent hashing?

Nate Lawson nate at root.org
Wed Nov 9 19:07:28 EST 2011


On Nov 9, 2011, at 3:49 PM, Nate Lawson wrote:

> On Nov 9, 2011, at 3:33 PM, Elias Levy wrote:
> 
>> On Wed, Nov 9, 2011 at 3:29 PM, Phil Stanhope <stanhope at gmail.com> wrote:
>> Tread carefully here ... by forcing localilty ... you will sacrifice high availability by algorithmically creating a bias and a single point of failure in the cluster. 
>> 
>> You don't have to loose high availability, your data is still being replicated, but you can create hot spots.  Known your data.
> 
> Correct. Partitioning based on SHA-1(DocumentID) is the same situation as doing it based on SHA-1(entire_key), which is how Riak currently works. Even if "entire_key" and "DocumentID" are both just simple counters, it is the same situation.
> 
> We would only need worry if the pair BucketName + DocumentID was not unique (say, skewed towards 0 or something). In that case, we'd need to analyze the distribution of DocumentID values to be sure the partition is balanced.


Sorry to reply to myself, but I wanted to add more detail.

You have multiple ways you could generate partitions: bucket, key-prefix, key, or even key+value. The question is really, "how many items do I need before the law of large numbers gets me enough balancing?" The answer depends on the data, as Elias mentioned.

Obviously, partitioning based only on bucket would be bad if you wrote mostly to one bucket. But more subtly, you could write equally to all buckets but store the largest or most frequently-accessed values in only one bucket.

Even Riak's current partitioning scheme could be imbalanced if you only stored large values in keys whose SHA-1 has a certain prefix. That's admittedly extremely unlikely, which is why Riak chose this scheme. But it could happen.

Anyway, overriding the default partitioning function is something that should always be an advanced-only feature and "know your data" first ...

-Nate





More information about the riak-users mailing list