Tune Riak for fast inserts - populate DB

Guido Medina guido.medina at temetra.com
Wed Feb 13 05:29:41 EST 2013


Are you transferring using a single thread? If so, I would recommend you 
to use a ThreaPoolExecutor and schedule each write as you, control the 
failures (if any) using either an AtomicInteger or a 
concurrent/synchronized list where you can track the keys that failed.

No matter how much you do, a single threaded transfer won't help you at 
all. We have done transfers many times and depending on the size of the 
DB table, we use single thread or thread pool service. Try 8 threads and 
see the difference, assuming you have N connections in your Riak client 
where N>max thread pool size.

You might want to remove pw=1 when using multi-threading so Riak doesn't 
fallback behind too much (elevel db catch up? whatever that's called), 
pw=1 will add more risk than the benefit you gain.

Hope that helps,

Guido.

On 13/02/13 09:44, Bogdan Flueras wrote:
> Ok, so I've done something like this:
> Bucket bucket = client.createBucket("foo"); // lastWriteWins(true) 
> doesn't work for Protobuf
>
> when I insert I have:
> bucket.store(someKey, someValue).withoutFetch().pw(1).execute();
>
> It looks like it's 20% faster than before. Is there something I could 
> further tweak ?
>
> ing. Bogdan Flueras
>
>
>
> On Wed, Feb 13, 2013 at 10:19 AM, Bogdan Flueras 
> <flueras.bogdan at gmail.com <mailto:flueras.bogdan at gmail.com>> wrote:
>
>     Each thread has it's own bucket instance (pointing to the same
>     location) and I don't re-fetch the bucket per insert.
>     Thank you very much!
>
>     ing. Bogdan Flueras
>
>
>
>     On Wed, Feb 13, 2013 at 10:14 AM, Russell Brown
>     <russell.brown at me.com <mailto:russell.brown at me.com>> wrote:
>
>
>         On 13 Feb 2013, at 08:07, Bogdan Flueras
>         <flueras.bogdan at gmail.com <mailto:flueras.bogdan at gmail.com>>
>         wrote:
>
>         > How to set the bucket to last write? Is it in the builder?
>
>         Something like:
>
>             Bucket b =
>         client.createBucket("my_bucket").lastWriteWins(true);
>
>         Also, after you've created the bucket, do you use it from all
>         threads? You don't re-fetch the bucket per-insert operation,
>         do you?
>
>         But  the "withoutFecth()" option is probably going to be the
>         biggest performance increase, and safe if you are only doing
>         inserts.
>
>         Cheers
>
>         Russell
>
>         > I'll have a look..
>         > Yes, I use more threads and the bucket is configured to
>         spread the load across all nodes.
>         >
>         > Thanks, I'll have a deeper look into the API and let you
>         know about my results.
>         >
>         > ing. Bogdan Flueras
>         >
>         >
>         >
>         > On Wed, Feb 13, 2013 at 10:02 AM, Russell Brown
>         <russell.brown at me.com <mailto:russell.brown at me.com>> wrote:
>         > Hi,
>         >
>         > On 13 Feb 2013, at 07:37, Bogdan Flueras
>         <flueras.bogdan at gmail.com <mailto:flueras.bogdan at gmail.com>>
>         wrote:
>         >
>         > > Hello all,
>         > > I've got a 5 node cluster with Riak 1.2.1, all machines
>         are multicore,
>         > > with min 4GB RAM.
>         > >
>         > > I want to insert something like 50 million records in Riak
>         with the java client (Protobuf used) with default settings.
>          I've tried also with HTTP protocol and set w = 1 but got some
>         problems.
>         > >
>         > > However the process is very slow: it doesn't write more
>         than 6GB/ hour or aprox. 280 KB/second.
>         > > To have all my data filled in, it would take aprox 2 days !!
>         > >
>         > > What can I do to have the data filled into Riak ASAP?
>         > > How should I configure the cluster ? (vm.args/ app.config)
>         I don't care so much about consistency at this point.
>         >
>         > If you are certain to be only inserting new data setting
>         your bucket(s) to last write wins will speed things up. Also,
>         are you using multiple threads for the Java client insert?
>         Spreading the load across all five nodes? Are you using the
>         "withoutFetch()" option on the java client?
>         >
>         > Cheers
>         >
>         > Russell
>         >
>         > >
>         > > Thank you,
>         > > ing. Bogdan Flueras
>         > >
>         > > _______________________________________________
>         > > riak-users mailing list
>         > > riak-users at lists.basho.com <mailto:riak-users at lists.basho.com>
>         > >
>         http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>         >
>         >
>
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130213/130070a8/attachment.html>


More information about the riak-users mailing list