Riak Secondary Index Limits
bryan at go-factory.net
Thu Aug 28 22:18:17 EDT 2014
Apologies as this has probably been asked before. Unfortunately I have not been able to parse through the list serve to find a reasonable answer and the Basho wiki docs seem to be missing this information. I have read up on the secondary index docs.
I am interested to better understand how the secondary indexes perform when there is a very low distribution of values that are indexed. For example, lets say I have a bucket with 1 million objects that I create a secondary index on. Now lets say the index is on a value that has an uneven distribution where one of the values is not selective while the others are, such that 60% of the values fall into a single indexed value, while the remaining 40% have a good distribution.
For example, I have a record (i.e. object) where the indexed field is ‘foobar_bin'. I have 1 million objects in the bucket that have 100 unique ‘foobar’ values distributed over the 1 million objects. One of the values repeats for 60% of the records (600K) and the rest have an even distribution of about 4%.
How will the secondary indexes perform with this and is this an appropriate use of the secondary indexes? Finally, what I have read is not completely clear on what happens if the indexed value is updated when the value has such a low degree of selectivity?
We have less than 512 partitions and are using the erlang client.
Thanks in advance - any insights will be much appreciated!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users