Tune Riak for fast inserts - populate DB

Guido Medina guido.medina at temetra.com
Wed Feb 13 05:40:43 EST 2013


Also, which I forgot on my reply, make sure your Riak client is 
connected to each node and not only to a single node (cluster config 
doesn't work that well, so try haproxy and make sure you are using 
protocol buffers)

/HA proxy sample config:/ https://gist.github.com/gburd/1507077

And a single PB config like this one which will connect HA proxy load 
balancer assuming it is running on localhost and it is connected to each 
node:
/
final PBClientConfig clientConfig=new 
PBClientConfig.Builder().withHost("127.0.0.1").withPort(8087).withPoolSize(N).build();/

Guido.

On 13/02/13 10:29, Guido Medina wrote:
> Are you transferring using a single thread? If so, I would recommend 
> you to use a ThreaPoolExecutor and schedule each write as you, control 
> the failures (if any) using either an AtomicInteger or a 
> concurrent/synchronized list where you can track the keys that failed.
>
> No matter how much you do, a single threaded transfer won't help you 
> at all. We have done transfers many times and depending on the size of 
> the DB table, we use single thread or thread pool service. Try 8 
> threads and see the difference, assuming you have N connections in 
> your Riak client where N>max thread pool size.
>
> You might want to remove pw=1 when using multi-threading so Riak 
> doesn't fallback behind too much (elevel db catch up? whatever that's 
> called), pw=1 will add more risk than the benefit you gain.
>
> Hope that helps,
>
> Guido.
>
> On 13/02/13 09:44, Bogdan Flueras wrote:
>> Ok, so I've done something like this:
>> Bucket bucket = client.createBucket("foo"); // lastWriteWins(true) 
>> doesn't work for Protobuf
>>
>> when I insert I have:
>> bucket.store(someKey, someValue).withoutFetch().pw(1).execute();
>>
>> It looks like it's 20% faster than before. Is there something I could 
>> further tweak ?
>>
>> ing. Bogdan Flueras
>>
>>
>>
>> On Wed, Feb 13, 2013 at 10:19 AM, Bogdan Flueras 
>> <flueras.bogdan at gmail.com <mailto:flueras.bogdan at gmail.com>> wrote:
>>
>>     Each thread has it's own bucket instance (pointing to the same
>>     location) and I don't re-fetch the bucket per insert.
>>     Thank you very much!
>>
>>     ing. Bogdan Flueras
>>
>>
>>
>>     On Wed, Feb 13, 2013 at 10:14 AM, Russell Brown
>>     <russell.brown at me.com <mailto:russell.brown at me.com>> wrote:
>>
>>
>>         On 13 Feb 2013, at 08:07, Bogdan Flueras
>>         <flueras.bogdan at gmail.com <mailto:flueras.bogdan at gmail.com>>
>>         wrote:
>>
>>         > How to set the bucket to last write? Is it in the builder?
>>
>>         Something like:
>>
>>             Bucket b =
>>         client.createBucket("my_bucket").lastWriteWins(true);
>>
>>         Also, after you've created the bucket, do you use it from all
>>         threads? You don't re-fetch the bucket per-insert operation,
>>         do you?
>>
>>         But  the "withoutFecth()" option is probably going to be the
>>         biggest performance increase, and safe if you are only doing
>>         inserts.
>>
>>         Cheers
>>
>>         Russell
>>
>>         > I'll have a look..
>>         > Yes, I use more threads and the bucket is configured to
>>         spread the load across all nodes.
>>         >
>>         > Thanks, I'll have a deeper look into the API and let you
>>         know about my results.
>>         >
>>         > ing. Bogdan Flueras
>>         >
>>         >
>>         >
>>         > On Wed, Feb 13, 2013 at 10:02 AM, Russell Brown
>>         <russell.brown at me.com <mailto:russell.brown at me.com>> wrote:
>>         > Hi,
>>         >
>>         > On 13 Feb 2013, at 07:37, Bogdan Flueras
>>         <flueras.bogdan at gmail.com <mailto:flueras.bogdan at gmail.com>>
>>         wrote:
>>         >
>>         > > Hello all,
>>         > > I've got a 5 node cluster with Riak 1.2.1, all machines
>>         are multicore,
>>         > > with min 4GB RAM.
>>         > >
>>         > > I want to insert something like 50 million records in
>>         Riak with the java client (Protobuf used) with default
>>         settings.  I've tried also with HTTP protocol and set w = 1
>>         but got some problems.
>>         > >
>>         > > However the process is very slow: it doesn't write more
>>         than 6GB/ hour or aprox. 280 KB/second.
>>         > > To have all my data filled in, it would take aprox 2 days !!
>>         > >
>>         > > What can I do to have the data filled into Riak ASAP?
>>         > > How should I configure the cluster ? (vm.args/
>>         app.config) I don't care so much about consistency at this point.
>>         >
>>         > If you are certain to be only inserting new data setting
>>         your bucket(s) to last write wins will speed things up. Also,
>>         are you using multiple threads for the Java client insert?
>>         Spreading the load across all five nodes? Are you using the
>>         "withoutFetch()" option on the java client?
>>         >
>>         > Cheers
>>         >
>>         > Russell
>>         >
>>         > >
>>         > > Thank you,
>>         > > ing. Bogdan Flueras
>>         > >
>>         > > _______________________________________________
>>         > > riak-users mailing list
>>         > > riak-users at lists.basho.com
>>         <mailto:riak-users at lists.basho.com>
>>         > >
>>         http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>         >
>>         >
>>
>>
>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130213/e911f8f0/attachment.html>


More information about the riak-users mailing list