Advices for cluster optimal configuration
siculars at gmail.com
Tue Mar 8 17:57:56 EST 2011
Return body = false has no impact on the consistency of your data. It will just lighten your network traffic.
On Mar 8, 2011, at 2:10 PM, Thibault Dory wrote:
> Thank you for your input Jeremiah,
> I would like to keep the strong consistency when I'm writing data, so I will keep the current setting.
> I'm not benchmarking the bulk load part but random read/update and MapReduce performances, can I turn off the returnbody and still keep my strong consistency and still see the errors?
> As I'm benchmarking MapReduce performances I cannot set the number of reduce VM to zero.
> 2011/3/8 Jeremiah Peschka <jeremiah.peschka at gmail.com>
> There are a few things that you can do to speed up your load. When you're writing your data, you can set both W and DW to 0 (as long as you have a way to check for errors). This will shave a bit of time off of each write because you'll be throwing writes against the database and hoping that they stick. You can also set the returnbody to false. Returnbody defaults to true IIRC. When returnbody enabled, Riak will return the object you wrote and also include the Riak specific info (vclock, etc). I don't care about these things when I'm doing a bulk load, so I turn that sort of thing off.
> I suspect somebody smarter will have better input and will correct me, but that's my 2 cents worth.
> Jeremiah Peschka
> Microsoft SQL Server MVP
> MCITP: Database Developer, DBA
> On Tuesday, March 8, 2011 at 8:34 AM, Thibault Dory wrote:
>> I'm benchmarking various noSQL databases (see www.nosqlbenchmarking.com for current results and configurations used) for my master's thesis and I'm going to apply this benchmark on bigger clusters. Indeed for the moment I have only used a small cluster of 8 servers with a very small data set (20000 articles from Wikipedia) to conduct those tests.
>> I will use up to 100 servers (2Gb, 4 CPU, 80Gb hdd) from the Rackspace cloud and the new data set is the entire English version of Wikipedia. Each article is store as a single document with a unique ID based on a integer, you can see the implementation here : https://github.com/toflames/Wikipedia-noSQL-Benchmark/blob/master/src/implementations/riakDB.java and the benchmark methodology here : http://www.slideshare.net/ThibaultDory/a-new-methodology-for-large
>> I would like to know if some of you have advice on how I could take the best out of Riak for this specific use case and on this kind of server. For example I would like to know if there are some memory/cache tunings that could be useful to match this server size.
>> Any other input or critic is welcome,
>> Thank you,
>> Thibault Dory
>> riak-users mailing list
>> riak-users at lists.basho.com
> riak-users mailing list
> riak-users at lists.basho.com
More information about the riak-users