Advices for cluster optimal configuration

Thibault Dory dory.thibault at gmail.com
Tue Mar 8 14:10:10 EST 2011


Thank you for your input Jeremiah,

I would like to keep the strong consistency when I'm writing data, so I will
keep the current setting.
I'm not benchmarking the bulk load part but random read/update and MapReduce
performances, can I turn off the returnbody and still keep my strong
consistency and still see the errors?

As I'm benchmarking MapReduce performances I cannot set the number of reduce
VM to zero.

2011/3/8 Jeremiah Peschka <jeremiah.peschka at gmail.com>

>  There are a few things that you can do to speed up your load. When you're
> writing your data, you can set both W and DW to 0 (as long as you have a way
> to check for errors). This will shave a bit of time off of each write
> because you'll be throwing writes against the database and hoping that they
> stick. You can also set the returnbody to false. Returnbody defaults to true
> IIRC. When returnbody enabled, Riak will return the object you wrote and
> also include the Riak specific info (vclock, etc). I don't care about these
> things when I'm doing a bulk load, so I turn that sort of thing off.
>
> Depending on the type of querying you're doing, you can adjust the
> JavaScript VM settings. For example, if you aren't doing any reduce phases
> in your queries, then you can set the number of reduce VMs to 0. Since
> you're probably only doing key lookups, you can probably kill off all of the
> JavaScript VMs.
>
> I suspect somebody smarter will have better input and will correct me, but
> that's my 2 cents worth.
>
> --
> Jeremiah Peschka
> Microsoft SQL Server MVP
> MCITP: Database Developer, DBA
>
> On Tuesday, March 8, 2011 at 8:34 AM, Thibault Dory wrote:
>
> Hello,
>
> I'm benchmarking various noSQL databases (see www.nosqlbenchmarking.com for
> current results and configurations used) for my master's thesis and I'm
> going to apply this benchmark on bigger clusters. Indeed for the moment I
> have only used a small cluster of 8 servers with a very small data set
> (20000 articles from Wikipedia) to conduct those tests.
>
> I will use up to 100 servers (2Gb, 4 CPU, 80Gb hdd) from the Rackspace
> cloud and the new data set is the entire English version of Wikipedia. Each
> article is store as a single document with a unique ID based on a integer,
> you can see the implementation here :
> https://github.com/toflames/Wikipedia-noSQL-Benchmark/blob/master/src/implementations/riakDB.java and
> the benchmark methodology here :
> http://www.slideshare.net/ThibaultDory/a-new-methodology-for-large
>
> I would like to know if some of you have advice on how I could take the
> best out of Riak for this specific use case and on this kind of server. For
> example I would like to know if there are some memory/cache tunings that
> could be useful to match this server size.
>
> Any other input or critic is welcome,
>
> Thank you,
>
>
> Thibault Dory
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110308/6a55decc/attachment.html>


More information about the riak-users mailing list