rik loading taking huge time?any suggestion for betterment

Mike Oxford moxford at gmail.com
Tue Aug 28 13:56:41 EDT 2012


Use the https://github.com/basho/riak-erlang-client directly, instead of
calling os:cmd and pushing through CURL.
You can also parallelize it at that time, because right now you're doing
25million os:cmd calls and making 25million curl calls.  Open up a pool of
connections (or even just N and round-robin them) and keep them open.

A 2-node cluster will have 1/3 of the set on one machine, and 2/3 on the
other.  You may consider moving to N=2 on the bucket, which will put one
copy on each machine (eg, dual-master.)

Beyond that, you have not provided enough information as to where the
bottleneck may be, though I'm sure the Basho crew will have some better
better answers.  :)

-mox


On Mon, Aug 27, 2012 at 8:26 PM, <Sangeetha.PattabiRaman2 at cognizant.com>wrote:

>  Dear team,
>
>
>
>
>
> I am trying to load 25 million dataset (1.3 Gb)  of sample call data  onto
> riak..its a 4-quad core ---1.5 TB storage 2-node raik cluster…takes
>  real    5671m12.812s.please suggest the solutions for the betterment of
> the same…5671m12.812s is quite huge…we deal with bigdata and I need to
> store and test 165 GB on the riak..if so I may take years for loading I
> guess with the present scenario…loaded 165 GB on to mongodb and got the
> results..for *comparative performance study of mongodb  and riak db* …please do assist me with the  same .
>
>
>
>
>
>
>
> *using the following code for loading :*
>
>
>
> #!/usr/local/bin/escript
>
> main([Filename]) ->
>
>     {ok, Data} = file:read_file(Filename),
>
>     Lines = tl(re:split(Data, "\r?\n", [{return, binary},trim])),
>
>     lists:foreach(fun(L) -> LS = re:split(L, ","), format_and_insert(LS)
> end, Lines).
>
>
>
> format_and_insert(Line) ->
>
>     JSON =
> io_lib:format("{\"id\":\"~s\",\"phonenumber\":~s,\"callednumber\":~s,\"starttime\":~s,\"endtime\":~s,\"status\":~s}",
> Line),
>
>     Command = io_lib:format("curl -X PUT
> http://10.232.5.169:8098/riak/CustCalls25m/~s -d '~s' -H 'content-type:
> application/json'", [hd(Line),JSON]),
>
>     io:format("Inserting: ~s~n", [hd(Line)]),
>
>     os:cmd(Command).
>
>
>
> *[hadoop at CTSINGMRGTO data]$ time ./load_data25m CustCalls25m.csv >>
> 25m.txt &*
>
> [3] 32354
>
>
>
>
>
> [hadoop at CTSINGMRGTO data]$
>
> *real    5671m12.812s*
>
> user    1725m31.862s
>
> sys     3074m42.135s
>
> [hadoop at CTSINGMRGTO data]$
>
>
>
> [hadoop at CTSINGMRGTO data]$ tail -4 25m.txt
>
> Inserting: 24999997
>
> Inserting: 24999998
>
> Inserting: 24999999
>
> *Inserting: 25000000*
>
> [hadoop at CTSINGMRGTO data]$
>
>
>  This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful.
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120828/f055b196/attachment.html>


More information about the riak-users mailing list