2i Index Durability/Resilience

Rune Skou Larsen rsl at trifork.com
Tue Oct 8 04:10:14 EDT 2013


Another thing 2i lacks, is the ability to set the R-value. Just like 
when GET'ing on a specific key, there is a tradeoff between consistency 
and performance in choosing the set of vnodes to query.

The 2i query implementation in Riak chooses performance over consistency 
by only querying the minimum set vnodes (1/n), which is the equivalent 
of R=1. In other words, 2i queries have no entropy tolerence. If an 
object is missing from a given vnode (not yet fixed by AE), you will get 
inconcistent results in approximately 1/n of your 2i queries for it. 
Riak chooses the 1/n vnode set somewhat randomly, so you can experience 
that running the 2i query multiple times returns different sets of keys.

There is currently no easy way to change that for high-consistency use 
cases. As an experiment, I have tried modifying Riak's source code, to 
make it query all vnodes, which seemingly works fine. However, Riak will 
then return (up to) n copies of each key, so these need to be deduped by 
the client. Obviously, this alternative approach favours concistency 
over performance, and will be ~3 times as expensive.

Also remember that AAE will fix entropy eventually, so the windows for 
inconsistent 2i queries will be closed by Riak itself after some time. 
You can look at the AAE logs to deduct the level of entropy you are 
running with and guestimate the impact from that on the consistency of 
your 2i queries.

- Rune, Trifork


Den 07-10-2013 23:46, Jon Meredith skrev:
> Hi Brady,
>
> The 2I indices are written in the same store as the main objects 
> whenever the main object is updated.  If a primary node is down, the 
> indices will be written to a fallback node.  When the fallback sees 
> the primary come back online and stops receiving requests for that 
> partition it will send the main object back to the primary and that 
> will re-index it.
>
> The docs could benefit with a little clarification.  Secondary indices 
> do benefit from read repair, that is if the main object is spotted as 
> being out of date or missing during a get, it is rewritten with the up 
> to date information on all nodes.  The anti-entropy mechanism that we 
> are currently missing is spotting corruption within leveldb itself. 
>  For example if part of a the leveldb database storing a vnode is 
> corrupted so that the .sst files containing the index entries were 
> destroyed there is no mechanism to spot and repair that.  We are 
> intending to add that for the next major release.
>
> Jon
>
>
> On Mon, Oct 7, 2013 at 2:39 PM, Brady Wetherington 
> <brady at bespincorp.com <mailto:brady at bespincorp.com>> wrote:
>
>     What happens to your 2i indexes if you do a write and one of the
>     nodes you're trying to write to is down?
>
>     http://docs.basho.com/riak/latest/dev/using/2i/ says:
>
>       * When you want or need anti-entropy. Since 2i is just metadata
>         on the KV object and the indexes reside on the same node, 2i
>         piggybacks off of read-repair.
>
>     But
>     http://docs.basho.com/riak/latest/ops/running/recovery/repairing-indexes/
>     says:
>
>     Riak Secondary indexes (2i) currently have no form of anti-entropy
>     (such as read-repair). Furthermore, for performance and load
>     balancing reasons, 2i reads from 1 random node. This means that
>     when a replica loss has occurred, inconsistent results may be
>     returned.
>
>     I am building a solution around 2i - so I just wanted to know if
>     there was any way to clarify these points - how resilient are
>     these indexes? Under what circumstances will they stop working (or
>     return inconsistent results)?
>
>     -B.
>
>     _______________________________________________
>     riak-users mailing list
>     riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>     http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> -- 
> Jon Meredith
> VP, Engineering
> Basho Technologies, Inc.
> jmeredith at basho.com <mailto:jmeredith at basho.com>


-- 
sdfd

Best regards / Venlig hilsen

*Rune Skou Larsen*
Trifork Public A/S
Dyssen 1, 8200 Århus N, Denmark
Phone: +45 3160 2497	Skype: runeskoularsen	twitter: @RuneSkouLarsen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20131008/53d602ff/attachment.html>


More information about the riak-users mailing list