Yokozuna max practical bucket limit

Ryan Zezeski rzezeski at basho.com
Mon Apr 22 16:21:25 EDT 2013


I could see that being something some folks want.  From my point of view, I
> find that the existing design of one core per bucket may be more useful, so
> long as I can search across cores with similar schemas (I created an issue<https://github.com/basho/yokozuna/issues/87>to track that feature), as it allows me to easily drop the index for a
> bucket.  In a multi-tenant environment, where you may have an index per
> customer, this is rather useful.  A lot less painful than trying to delete
> the index (and data) by performing a key listing and delete operations.

Well you still can't avoid the key-listing/delete for Riak itself.  For
Solr this would be a delete-by-query which isn't nearly as expensive.

> As I've expressed before, I wish buckets behaved the same way, segregating
> their data into distinct backend, but I understand that this results in
> lower resource usage, as things like LevelDB caches would then not be
> shared and you'd need additional file descriptors.  At the very least, it
> would be great if backend instances could be
> created programmatically through the HTTP or PB API, rather than having to
> modify app.config and perform a rolling restart.  That not very
> operationally friendly.

Yes, there are benefits to be had both ways.  Segregating the actual
backend instances allows for efficient drop of entire bucket, but adds
strain in terms of file descriptors and I/O contention.  Multi-backend
sorta helps but is static in nature as you mention.

> As for large number of cores, I could see some folks creating many of
> them.  Buckets are relatively cheap, since by default they are all stored
> in the default backend instance.  Their only cost being
> the additional network traffic for gossiping non-default bucket properties.
>  So folks create them freely. Once Yokozuna is better documented, it should
> be pointed out that the same is not true of a bucket's index, since they
> create one core per bucket.  So an indexed bucket has quite a bit more
> static overhead than non-indexed one.

Good point.

> If you use Riak and have 300 customers, you can easily create a bucket per
> customer, even if you only have 64 partions and are using Riak Search on
> all of them, as Search stores all the data in the same merge index backend.
>  You may want to twice before upgrading such cluster to Yokozuna.

Well, Riak Search will have issues as well.  First, each bucket will
require a pre-commit hook to be installed which means custom bucket
properties to be copied into the ring.  There is a known drawback with Riak
where many bucket properties greatly reduce ring gossip throughput and can
cause issues.  I believe Joseph Blomstedt may have some patches going into
the next release that will improve this but ultimately we need to get
bucket properties out of the ring.  Even if that is solved, Riak Search
will have other tradeoffs such as substantially reduced feature support
compared to Yokozuna as well as reduced performance for many types of
queries.  But I do agree many indexes (thus cores) could pose a problem for

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130422/803ac4d5/attachment.html>

More information about the riak-users mailing list