Yokozuna Scale

anandm anand at zerebral.co.in
Wed Sep 17 07:21:36 EDT 2014


Oh.. That's nice to know that we'll probably be the first ones to deploy it
to that scale... :-) 

We'd love to go in with a replication factor of 3 - so I'm looking at
hosting 750M docs in the entire cluster of 6 big servers.
So if we go by your suggestion to set up multiple nodes per physical server
- say 6 nodes per - that would be 36 Riak nodes and a corresponding Solr
instance - each hosting about 21M documents. I would not want one Solr Core
serving 21M - may be about 10M is okay - so that means we need to double the
node count further? So, 12-15 Riak-YZ nodes per server - hosting a total of
150M docs stored and indexed - would that be too much for a server to handle
(12 core+96GB RAM+2TB Storage)?

One more concern I have is - our current SolrCloud deployment works with
composite ids - to help Solr co-locate documents - so the client code can
route the reads to the correct shards - so that each query does not have to
hit all shards. With Yokozuna that does not seem to be the case? All Solr
search queries will hit all shards always? 

Could it have been better, if there was a Riak plugin to Solr that just
stored and fetched docs from the underlying Riak store to Solr? [In analogy
- the way riak uses LevelDB or BitCask internally - the same way Solr used
Riak for actual doc storage] I'm more of a Solr user (have all data Stored +
Indexed into Solr till date) and hence I'm more inclined towards looking at
it this way rather than looking at Riak as a primary store with continuous
indexing into Solr. Also thinking this way - as Riak is a KV store and Solr
holds more insights to the document schema? So on Solr side we continue to
use SolrCloud (that I know works well for my project - and my need is just
to move the data out of it - and keep the index the way it is - composite id
sharding + replication enable) and have Riak as separate cluster (may be 6
nodes suffice here - each with 150M docs) - and we scale up each of these
clusters separately whenever one of them hits the limits?

FYI - Our current daily load is - 2M searches, 0.6M inserts, 0.8M updates -
and this may triple in about a couple of months or so as our other modules
that generate and dump this data scale further. 





--
View this message in context: http://riak-users.197444.n3.nabble.com/Yokozuna-Scale-tp4031808p4031810.html
Sent from the Riak Users mailing list archive at Nabble.com.




More information about the riak-users mailing list