[ANNC] Riak 1.3.0 RC1

Joseph Blomstedt joe at basho.com
Thu Jan 31 00:22:33 EST 2013


Wanted to add some additional information to augment Sean's comments.

> Does AAE also apply to search? Or is it only KV that benefits from this?  The reason I'm asking is of cause that this list has seen some cases of bit rot in merge index files.

Easy to miss, but the very end of the release notes section on AAE
mentions it's for K/V only.

As Sean mentioned, partition repair still works for Search data. In
theory, we could support AAE for Search using the same approach the
repair command uses: repairing specific merge_index data from replicas
of the merge_index data on adjacent partitions in the ring. Basically,
just automating partition repair. The main issue is that Search data
doesn't include anything equivalent to vector clocks. If two replicas
have divergent data, who wins? With the repair command, the user
implicitly makes that determination.

Also, repairing merge_index data only addresses merge_index durability
issues. It doesn't help with K/V vs Search consistency issues. Most
users of Riak Search are using it to index Riak K/V data. In that
case, a much more useful guarantee would be that the Search index is
both properly replicated and consistent with the K/V data. In other
words, a system that performs AAE between the K/V data and the Search
index. This is very hard to do for Riak Search due to the use of
term-based partitioning. K/V data written to any node in the cluster
can trigger Search index updates on arbitrary nodes in the cluster.
Thus, correct K/V + Search AAE would require exchanges between all
nodes in the cluster. I've played around with the idea of breaking the
cluster into a tree-like structure and performing a top-down fan-out
style exchange to make this type of AAE tractable, but it's still a
lot of additional complexity. For all these reasons, we decided to go
with K/V only.

Of course, the next-generation search solution for Riak (Yokozuna)
already includes K/V + Yokozuna AAE. This is much more straightforward
because Yokozuna use document-based partitioning. It also works really
well. K/V AAE between nodes repairs K/V replicas as necessary. Then,
local AAE between the local K/V data and the local Solr index ensures
that the index is consistent with the K/V data. Since the K/V data is
replicated + repaired, the Yokozuna indices are also implicitly
replicated and repaired despite using local-only exchange. Ryan
Zezeski did a great job designing Yokozuna from the beginning to
exploit AAE, both for data integrity as well as a general solution to
the K/V vs Index consistency problem.

>
> Given that AAE necessarily add extra I/O, should we expect that I/O to noticeably effect throughput of K/V?

As always, it depends on the workload and situation. If a system is
running very close to its maximum IOPs rate, then AAE could have a
noticeable effect. However, a lot of work was done to ensure AAE was
as low-impact as possible. A few points.

First, the most expensive operation is going to be AAE tree building.
Either when initially moving to 1.3, or when trees are expired and
rebuilt (which happens once a week by default). Each tree build is a
fold over a given partition's data. Of course, handoff and the
fullsync replication feature in Riak Enterprise both do similar folds.
If you have insight into how your cluster handles either of these
operations, than you should have a reasonable idea of how tree
building will impact cluster performance.

Second, the task of keeping trees up-to-date triggers an additional
AAE write per Riak K/V write. However, these writes are very small
(key + hash + some overhead). To improve performance, these writes are
buffered in memory until there are 200 hash updates, and then the
updates are sent to the respective hash tree LevelDB instance in a
single batch write.

Finally, actual AAE exchange between replicas are very inexpensive.
Likewise, AAE repairs data by performing a single read at a time in
the same manner as a normal client. As such, if there are lots of
client requests, than AAE repair is implicitly throttled. If there are
no client requests, than AAE repairs data as fast a Riak can respond
to read requests.

Hope this provides some additional insight.

BTW, if anyone wants to dive more into AAE's design, here's a link to
a video I made months back to describe the AAE design to other
engineers at Basho. The content is slightly outdated and not 100%
current with the code shipping in 1.3, but it's 90% the same and the
ideas and concepts remain unchanged. And again, this was meant to be a
informal presentation between co-workers so don't expect anything
polished:

http://coffee.jtuple.com/video/AAE.html

Regards,
Joe

---
Joseph Blomstedt
Senior Software Engineer
Basho Technologies
@jtuple

On Wed, Jan 30, 2013 at 5:24 PM, Sean Cribbs <sean at basho.com> wrote:
> On Wed, Jan 30, 2013 at 6:04 PM, Kresten Krab Thorup <krab at trifork.com> wrote:
>> Looks great! I'm excited!
>>
>> Does AAE also apply to search? Or is it only KV that benefits from this?  The reason I'm asking is of cause that this list has seen some cases of bit rot in merge index files.
>
> It does not, however, the 1.2 "partition repair" feature is still
> functional and works with Riak Search.
>
>>
>> Given that AAE necessarily add extra I/O, should we expect that I/O to noticeably effect throughput of K/V?
>
> The throughput should be minimally affected. In early testing we found
> a significant overhead, but that has been greatly reduced thanks to
> improvements to eleveldb and AAE. However, of course we could not test
> all scenarios and would appreciate early adopters to put it through
> its paces.
>
> --
> Sean Cribbs <sean at basho.com>
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list