Tune Riak for fast inserts - populate DB

Bogdan Flueras flueras.bogdan at gmail.com
Wed Feb 13 05:46:20 EST 2013


Thanks guys :)
I use a ThreadPoolExecutor with 10 threads. I'll try your solutions and
keep you informed

ing. Bogdan Flueras



On Wed, Feb 13, 2013 at 12:40 PM, Guido Medina <guido.medina at temetra.com>wrote:

>  Also, which I forgot on my reply, make sure your Riak client is connected
> to each node and not only to a single node (cluster config doesn't work
> that well, so try haproxy and make sure you are using protocol buffers)
>
> *HA proxy sample config:* https://gist.github.com/gburd/1507077
>
> And a single PB config like this one which will connect HA proxy load
> balancer assuming it is running on localhost and it is connected to each
> node:
> *
> final PBClientConfig clientConfig=new
> PBClientConfig.Builder().withHost("127.0.0.1").withPort(8087).withPoolSize(N).build();
> *
>
> Guido.
>
>
> On 13/02/13 10:29, Guido Medina wrote:
>
> Are you transferring using a single thread? If so, I would recommend you
> to use a ThreaPoolExecutor and schedule each write as you, control the
> failures (if any) using either an AtomicInteger or a
> concurrent/synchronized list where you can track the keys that failed.
>
> No matter how much you do, a single threaded transfer won't help you at
> all. We have done transfers many times and depending on the size of the DB
> table, we use single thread or thread pool service. Try 8 threads and see
> the difference, assuming you have N connections in your Riak client where
> N>max thread pool size.
>
> You might want to remove pw=1 when using multi-threading so Riak doesn't
> fallback behind too much (elevel db catch up? whatever that's called), pw=1
> will add more risk than the benefit you gain.
>
> Hope that helps,
>
> Guido.
>
> On 13/02/13 09:44, Bogdan Flueras wrote:
>
>  Ok, so I've done something like this:
>  Bucket bucket = client.createBucket("foo"); // lastWriteWins(true)
> doesn't work for Protobuf
>
> when I insert I have:
>  bucket.store(someKey, someValue).withoutFetch().pw(1).execute();
>
>  It looks like it's 20% faster than before. Is there something I could
> further tweak ?
>
> ing. Bogdan Flueras
>
>
>
> On Wed, Feb 13, 2013 at 10:19 AM, Bogdan Flueras <flueras.bogdan at gmail.com
> > wrote:
>
>>  Each thread has it's own bucket instance (pointing to the same
>> location) and I don't re-fetch the bucket per insert.
>>  Thank you very much!
>>
>> ing. Bogdan Flueras
>>
>>
>>
>> On Wed, Feb 13, 2013 at 10:14 AM, Russell Brown <russell.brown at me.com>wrote:
>>
>>>
>>> On 13 Feb 2013, at 08:07, Bogdan Flueras <flueras.bogdan at gmail.com>
>>> wrote:
>>>
>>> > How to set the bucket to last write? Is it in the builder?
>>>
>>>  Something like:
>>>
>>>     Bucket b =   client.createBucket("my_bucket").lastWriteWins(true);
>>>
>>> Also, after you've created the bucket, do you use it from all threads?
>>> You don't re-fetch the bucket per-insert operation, do you?
>>>
>>> But  the "withoutFecth()" option is probably going to be the biggest
>>> performance increase, and safe if you are only doing inserts.
>>>
>>> Cheers
>>>
>>> Russell
>>>
>>> > I'll have a look..
>>> > Yes, I use more threads and the bucket is configured to spread the
>>> load across all nodes.
>>> >
>>> > Thanks, I'll have a deeper look into the API and let you know about my
>>> results.
>>> >
>>> > ing. Bogdan Flueras
>>> >
>>> >
>>> >
>>> > On Wed, Feb 13, 2013 at 10:02 AM, Russell Brown <russell.brown at me.com>
>>> wrote:
>>> > Hi,
>>> >
>>> > On 13 Feb 2013, at 07:37, Bogdan Flueras <flueras.bogdan at gmail.com>
>>> wrote:
>>> >
>>> > > Hello all,
>>> > > I've got a 5 node cluster with Riak 1.2.1, all machines are
>>> multicore,
>>> > > with min 4GB RAM.
>>> > >
>>> > > I want to insert something like 50 million records in Riak with the
>>> java client (Protobuf used) with default settings.  I've tried also with
>>> HTTP protocol and set w = 1 but got some problems.
>>> > >
>>> > > However the process is very slow: it doesn't write more than 6GB/
>>> hour or aprox. 280 KB/second.
>>> > > To have all my data filled in, it would take aprox 2 days !!
>>> > >
>>> > > What can I do to have the data filled into Riak ASAP?
>>> > > How should I configure the cluster ? (vm.args/ app.config) I don't
>>> care so much about consistency at this point.
>>> >
>>> > If you are certain to be only inserting new data setting your
>>> bucket(s) to last write wins will speed things up. Also, are you using
>>> multiple threads for the Java client insert? Spreading the load across all
>>> five nodes? Are you using the "withoutFetch()" option on the java client?
>>> >
>>> > Cheers
>>> >
>>> > Russell
>>> >
>>> > >
>>> > > Thank you,
>>> > > ing. Bogdan Flueras
>>> > >
>>> > > _______________________________________________
>>> > > riak-users mailing list
>>> > > riak-users at lists.basho.com
>>> > > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >
>>> >
>>>
>>>
>>
>
>
> _______________________________________________
> riak-users mailing listriak-users at lists.basho.comhttp://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130213/0ce253f5/attachment.html>


More information about the riak-users mailing list