Import big data to Riak
ivanov at vesseltracker.com
Tue Oct 29 10:59:50 EDT 2013
I am importing some big data to Riak.
I am importing like 10GB per day and i have to import one year of data.
The task is to speed up the initial import. After that i will import on daily
basis, so the speed is not very important.
I am using JAVA HTTP client. So far my test show that the fastest setup is to
use n_val 1 and import to single server.
I tested importing on 2 servers (with n_val:2), but it is actually slower.
My JAVA client is multi-threaded.
My idea is to use n_val:1 on single node, then increase the n_val:2 and add
one more node to the cluster. The problem is that i don't see the storage to
grow when i change n_val : 2
I was looking at Riak Active Anti-Entropy feature and i am expecting my
storage to grow after i increase the n_val. Unfortunately this is not the case
or i don't understand AAE feature ....
I can't any changes in storage size at all. I don't want to go in direction of
force repair as it would take forever.
Can anyone shed some light on AAE ? Or any tips for speeding up the import in
To summarize the situation :
1. One Riak node with n_val : 1 , eLevelDb as back-end
2. Import data.
3. Change n_val to 2
4. Join one more node to the cluster.
What i expect to happen :
To have all the keys distributed to 2 riak nodes with n_val:2
So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 and
joining one more node, to have 1TB of data on each node.
More information about the riak-users