Riak Secondary Index Limits

Bryan bryan at go-factory.net
Thu Aug 28 22:18:17 EDT 2014


Hi Everyone,

Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs.

I am interested to better understand how the secondary indexes perform when there is a very low distribution of values that are indexed. For example, lets say I have a bucket with 1 million objects that I create a secondary index on. Now lets say the index is on a value that has an uneven distribution where one of the values is not selective while the others are, such that 60% of the values fall into a single indexed value, while the remaining 40% have a good distribution.

For example, I have a record (i.e. object) where the indexed field is ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique ‘foobar’ values distributed over the 1 million objects. One of the values repeats for 60% of the records (600K) and the rest have an even distribution of about 4%.

How will the secondary indexes perform with this and is this an appropriate use of the secondary indexes? Finally, what I have read is not completely clear on what happens if the indexed value is updated when the value has such a low degree of selectivity?

We have less than 512 partitions and are using the erlang client.

Thanks in advance - any insights will be much appreciated!


Cheers,
Bryan

----

Bryan Hughes
Go Factory
http://www.go-factory.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20140828/ff3b7961/attachment.html>


More information about the riak-users mailing list