Import big data to Riak

Guido Medina guido.medina at temetra.com
Tue Oct 29 11:21:12 EDT 2013


Your tests are not close to what you are going to have in production 
IMHO, here are few recommendations:

 1. Build a cluster with at least 5 nodes with N=3 and R=W=2 (You can
    update your bucket properties via PBC with Java)
 2. Use PBC instead of HTTP.
 3. If you are only importing data call
    .store()....withoutFetch().execute() to avoid unnecessary roundtrips.

If you test using unrealistic scenarios you will find unpleasant 
surprises when you are about to be go live so better to set your 
expectations right at the beginning.

HTH,

Guido.

On 29/10/13 14:59, Georgi Ivanov wrote:
> Hello,
> I am importing some big data to Riak.
> I am importing like 10GB per day and i have to import one year of data.
> The task is to speed up the initial import. After  that i will import on daily
> basis, so the speed is not very important.
>
> I am using JAVA HTTP client. So far my test show that the fastest setup is to
> use n_val 1 and import to single server.
>
> I tested importing on 2 servers (with n_val:2), but it is actually slower.
> My JAVA client is multi-threaded.
>
> My idea is to use n_val:1 on single node, then increase the n_val:2 and add
> one more node to the cluster. The problem is that i don't see the storage to
> grow when i change n_val : 2
> I was looking at Riak Active Anti-Entropy feature and i am expecting my
> storage to grow after i increase the n_val. Unfortunately this is not the case
> or i don't understand AAE feature ....
> I can't any changes in storage size at all. I don't want to go in direction of
> force repair as it would take forever.
>
> Can anyone shed some light on AAE ? Or any tips for speeding up the import in
> general.
>
> To summarize the situation :
> 1. One Riak node with n_val : 1 , eLevelDb as back-end
> 2. Import data.
> 3. Change n_val to 2
> 4. Join one more node to the cluster.
>
> What i expect to happen :
> To have all the keys distributed to 2 riak nodes with n_val:2
> So if i had 1TB of data on node1 with n_val:1 , after changing to n_val 2 and
> joining one more node, to have 1TB of data on each node.
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20131029/a4c20e95/attachment.html>


More information about the riak-users mailing list