Import big data to Riak

Russell Brown russell.brown at me.com
Tue Oct 29 11:33:06 EDT 2013


Hi Georgi,

All Guido’s (below) advice is good. If you are just importing unique items, I would set the bucket property to LWW=true for the import, it will be much faster since Riak will not do N local reads for vclock data.

Cheers

Russell

On 29 Oct 2013, at 15:21, Guido Medina <guido.medina at temetra.com> wrote:

> Your tests are not close to what you are going to have in production IMHO, here are few recommendations:
> 	• Build a cluster with at least 5 nodes with N=3 and R=W=2 (You can update your bucket properties via PBC with Java)
> 	• Use PBC instead of HTTP.
> 	• If you are only importing data call .store()....withoutFetch().execute() to avoid unnecessary roundtrips.
> If you test using unrealistic scenarios you will find unpleasant surprises when you are about to be go live so better to set your expectations right at the beginning.
> HTH,
> Guido.
> On 29/10/13 14:59, Georgi Ivanov wrote:
>> Hello,
>> I am importing some big data to Riak. 
>> I am importing like 10GB per day and i have to import one year of data. 
>> The task is to speed up the initial import. After  that i will import on daily 
>> basis, so the speed is not very important.
>> 
>> I am using JAVA HTTP client. So far my test show that the fastest setup is to 
>> use n_val 1 and import to single server.
>> 
>> I tested importing on 2 servers (with n_val:2), but it is actually slower.
>> My JAVA client is multi-threaded.
>> 
>> My idea is to use n_val:1 on single node, then increase the n_val:2 and add 
>> one more node to the cluster. The problem is that i don't see the storage to 
>> grow when i change n_val : 2
>> I was looking at Riak Active Anti-Entropy feature and i am expecting my 
>> storage to grow after i increase the n_val. Unfortunately this is not the case  
>> or i don't understand AAE feature ....
>> I can't any changes in storage size at all. I don't want to go in direction of 
>> force repair as it would take forever.
>> 
>> Can anyone shed some light on AAE ? Or any tips for speeding up the import in 
>> general.
>> 
>> To summarize the situation :
>> 1. One Riak node with n_val : 1 , eLevelDb as back-end
>> 2. Import data.
>> 3. Change n_val to 2
>> 4. Join one more node to the cluster.
>> 
>> What i expect to happen :
>> To have all the keys distributed to 2 riak nodes with n_val:2
>> So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 and 
>> joining one more node, to have 1TB of data on each node.
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> 
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list