Yokozuna Scale

Eric Redmond eredmond at basho.com
Wed Sep 17 01:44:09 EDT 2014

On Sep 16, 2014, at 9:37 PM, anandm <anand at zerebral.co.in> wrote:

> I started checking out Riak as an alternative for one my projects. We are
> currently using SolrCloud 4.8 (36 Shards + 3 Replica each) - and all stored
> fields (64 fields per doc + each doc at about 6kb on average)
> I want to get this thing migrated - so push all data out of Solr and store
> it in a KV - like Riak, but keep the indexes going in Solr (as I have a lot
> of code written already around Solr)
> Came across Yokozuna today and sounds like thats going to be a perfect match
> for my requirement...
> Just a couple of questions I have - I tried searching online for answers
> (but couldn't find references to large Scale Yokozuna deployments)

That's because it's only been released for about 2 weeks. Give it some time :)

> 1. I have over 250M documents indexed & stored (thats very bad) in current
> SolrCloud deployment - with the replication factor of 3 - total Solr Index +
> Data Size is about 4.5TB spread across 6 Servers (12 core (24 threads) +
> 96GB)

>    Index Search performance and write performance is good enough with 36
> Shards and Composite Id routing - I want to migrate this straight to Riak
> with Yokozuna enabled.
>    I'll be deploying a 5-6 node Riak Cluster - that would mean roughly
> about 50M docs will be stored on each node - and Yokozuna will index it
> locally on each node's Solr too (only indexed fields) -

Riak also replicates data to maintain high availability. By default, this replication value
(n_val) is 3. So if you have 250M values, you'll end up with 750M documents by default,
or 150M per node if you stick with 6. If you reduce the n_val, you'll reduce both the
object count, as well as reduce availability.

>           a. Will this Solr instance have just one core to index the data?
> (As of now I just plan to have one bucket)

If you only create one index, yes.. There's one node per Riak node. You can, however,
run more than one Riak node per physical server, thus increasing the solr core count.

>           b. Would it be able to handle the load of searching through 50M
> docs with just one core? I think RAM wont be an issue - but I have not seen
> a single Solr instance serving 50M docs so a bit worried about that.

We've tested up to 10M per node, so you may require more nodes. But it's worth
running a small test to be sure.

> 2. Every time I query the Solr instance via Riak - /search hander - The
> actual search query will run in a distributed manner on Solr nodes in the
> cluster - but will Yokozuna also fetch the Stored fields for the docs or the
> entire docs from the underlying Riak instances too and return that to the
> search request? Or would my client app need to query the Riak docs in a
> separate query?

Riak will fetch whatever you've configured the schema/query to fetch. If you're
simply trying to get a full object, you may be better off performing the search query,
and then using the resulting document's key value (_yz_kv) to get the full
object from Riak KV.

> 3. Anybody with a large scale Yokozuna deployment and if you could make a
> quick comment on the deployment size, the hardware and overall throughput -
> that will help. 

You may be deploying the first Yokozuna deployment of that size. I'm interested
in your results. If this is a fixed set of documents, you may get away with a lower
n_val such as 2. Furthermore, you can try increasing the node count.

> Thanks.
> --
> View this message in context: http://riak-users.197444.n3.nabble.com/Yokozuna-Scale-tp4031808.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

More information about the riak-users mailing list