I2 queries fail when few nodes are down
mkessler at basho.com
Thu Jan 5 06:22:13 EST 2017
On 4 January 2017 at 23:22, Tomi Takussaari <tomi.takussaari at gmail.com>
> Hello Riak-users
> We have 9 node Riak-cluster, that we use to store user accounts.
> Some of the crucial data fields of user account are indexed using I2, so
> that we can do secondary index queries based on them.
> Today, we tested how our cluster performs when few nodes go down, and
> results were not very good.
> If more than 2 nodes go down, all I2 queries will start failing, returning
> HTTP 500, with "insufficient vnodes available" error. After nodes are up
> again, things start working again.
> Normal object CRUD operations worked fine.
> Is this to be expected behaviour ?
> Funny thing is, that we have other cluster, with same configuration but
> with 6 nodes, for other environment, and that also experiences same
> problems when more than 2 nodes go down, so it does not seem to have
> anything to do with percentage of nodes being down..
> Our ring size is 256, and current Riak version is 2.2.
> Both clusters were first created years ago, with Riak 1.4, if memory
> serves, and I believe we tested this same thing back then, and I2 queries
> did not stop working this easily then..
> Any help would be appreciated!
For a cluster that uses the default replication factor (`n_val`) of 3, the
behaviour you observed is expected. Secondary index queries work on a
covering set of VNodes, that include 1 replica for each KV object. With
`n_val=3` the covering set can only be guaranteed if no more than 2 nodes
are offline at any given time. This behaviour has not changed since the 1.4
As our documentation  states:
"Riak stores 3 replicas of all objects by default, although this can be
changed using bucket types, which manage buckets’ replication properties.
The system is capable of generating a full set of results from one third of
the system’s partitions as long as it chooses the right set of partitions.
The query is sent to each partition, the index data is read, and a list of
keys is generated and then sent back to the requesting node."
Other operations working fine is due to Riak's ability to spin up fallback
partitions (VNodes on one of the remaining nodes), that will accept and
temporarily store data while the Node that should own the data is down.
Client Services Engineer
Basho Technologies Limited
Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users