Yokozuna max practical bucket limit

Elias Levy fearsome.lucidity at gmail.com
Tue Apr 9 12:29:37 EDT 2013

On Mon, Apr 8, 2013 at 7:09 PM, Ryan Zezeski <rzezeski at basho.com> wrote:

> This is exactly why I chose not to make a core per partition.  My gut
> feeling was that most users are likely to have more partitions than indexed
> buckets.  I don't know the overhead per-core or what the limits might be.
>  I would recommend the Solr mailing list for questions like that.  I've
> also looked at that "LotsOfCores" page before.  One benefit to using Solr
> is that any improvements made to it should also trickle down to Yokozuna.
> That said, I still plan to allow a one-to-many mapping from index to
> buckets.  That would allow many KV buckets to index under the same core.  I
> have an idea of how to implement it.  I'm fairly certain it would work just
> fine.  I just need to add a GitHub issue and then it's a "simple matter of
> coding."

I could see that being something some folks want.  From my point of view, I
find that the existing design of one core per bucket may be more useful, so
long as I can search across cores with similar schemas (I created an
issue<https://github.com/basho/yokozuna/issues/87>to track that
feature), as it allows me to easily drop the index for a
bucket.  In a multi-tenant environment, where you may have an index per
customer, this is rather useful.  A lot less painful than trying to delete
the index (and data) by performing a key listing and delete operations.

As I've expressed before, I wish buckets behaved the same way, segregating
their data into distinct backend, but I understand that this results in
lower resource usage, as things like LevelDB caches would then not be
shared and you'd need additional file descriptors.  At the very least, it
would be great if backend instances could be
created programmatically through the HTTP or PB API, rather than having to
modify app.config and perform a rolling restart.  That not very
operationally friendly.

As for large number of cores, I could see some folks creating many of them.
 Buckets are relatively cheap, since by default they are all stored in the
default backend instance.  Their only cost being the additional network
traffic for gossiping non-default bucket properties.  So folks create them
freely. Once Yokozuna is better documented, it should be pointed out that
the same is not true of a bucket's index, since they create one core per
bucket.  So an indexed bucket has quite a bit more static overhead than
non-indexed one.

If you use Riak and have 300 customers, you can easily create a bucket per
customer, even if you only have 64 partions and are using Riak Search on
all of them, as Search stores all the data in the same merge index backend.
 You may want to twice before upgrading such cluster to Yokozuna.

Elias Levy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130409/6a4633d8/attachment.html>

More information about the riak-users mailing list