siculars at gmail.com
Wed Sep 17 15:09:48 EDT 2014
"So if we go by your suggestion to set up multiple nodes per physical server
- say 6 nodes per - that would be 36 Riak nodes and a corresponding Solr
instance - each hosting about 21M documents."
Something to be careful with is that as you ramp up the number of Riak
nodes on a single physical machine you increase the probability that all
your replicas will live on the same physical host, thereby eliminating the
utility of your replication factor.
I'm sure the long term solution will be to decouple the 1 to 1 solr runtime
to riak node constraint. But I imagine that will require some...
On Wed, Sep 17, 2014 at 7:21 AM, anandm <anand at zerebral.co.in> wrote:
> Oh.. That's nice to know that we'll probably be the first ones to deploy it
> to that scale... :-)
> We'd love to go in with a replication factor of 3 - so I'm looking at
> hosting 750M docs in the entire cluster of 6 big servers.
> So if we go by your suggestion to set up multiple nodes per physical server
> - say 6 nodes per - that would be 36 Riak nodes and a corresponding Solr
> instance - each hosting about 21M documents. I would not want one Solr Core
> serving 21M - may be about 10M is okay - so that means we need to double
> node count further? So, 12-15 Riak-YZ nodes per server - hosting a total of
> 150M docs stored and indexed - would that be too much for a server to
> (12 core+96GB RAM+2TB Storage)?
> One more concern I have is - our current SolrCloud deployment works with
> composite ids - to help Solr co-locate documents - so the client code can
> route the reads to the correct shards - so that each query does not have to
> hit all shards. With Yokozuna that does not seem to be the case? All Solr
> search queries will hit all shards always?
> Could it have been better, if there was a Riak plugin to Solr that just
> stored and fetched docs from the underlying Riak store to Solr? [In analogy
> - the way riak uses LevelDB or BitCask internally - the same way Solr used
> Riak for actual doc storage] I'm more of a Solr user (have all data Stored
> Indexed into Solr till date) and hence I'm more inclined towards looking at
> it this way rather than looking at Riak as a primary store with continuous
> indexing into Solr. Also thinking this way - as Riak is a KV store and Solr
> holds more insights to the document schema? So on Solr side we continue to
> use SolrCloud (that I know works well for my project - and my need is just
> to move the data out of it - and keep the index the way it is - composite
> sharding + replication enable) and have Riak as separate cluster (may be 6
> nodes suffice here - each with 150M docs) - and we scale up each of these
> clusters separately whenever one of them hits the limits?
> FYI - Our current daily load is - 2M searches, 0.6M inserts, 0.8M updates -
> and this may triple in about a couple of months or so as our other modules
> that generate and dump this data scale further.
> View this message in context:
> Sent from the Riak Users mailing list archive at Nabble.com.
> riak-users mailing list
> riak-users at lists.basho.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users