Import big data to Riak

Georgi Ivanov ivanov at vesseltracker.com
Tue Oct 29 10:59:50 EDT 2013


Hello,
I am importing some big data to Riak. 
I am importing like 10GB per day and i have to import one year of data. 
The task is to speed up the initial import. After  that i will import on daily 
basis, so the speed is not very important.

I am using JAVA HTTP client. So far my test show that the fastest setup is to 
use n_val 1 and import to single server.

I tested importing on 2 servers (with n_val:2), but it is actually slower.
My JAVA client is multi-threaded.

My idea is to use n_val:1 on single node, then increase the n_val:2 and add 
one more node to the cluster. The problem is that i don't see the storage to 
grow when i change n_val : 2
I was looking at Riak Active Anti-Entropy feature and i am expecting my 
storage to grow after i increase the n_val. Unfortunately this is not the case  
or i don't understand AAE feature ....
I can't any changes in storage size at all. I don't want to go in direction of 
force repair as it would take forever.

Can anyone shed some light on AAE ? Or any tips for speeding up the import in 
general.

To summarize the situation :
1. One Riak node with n_val : 1 , eLevelDb as back-end
2. Import data.
3. Change n_val to 2
4. Join one more node to the cluster.

What i expect to happen :
To have all the keys distributed to 2 riak nodes with n_val:2
So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 and 
joining one more node, to have 1TB of data on each node.





More information about the riak-users mailing list