issue on riak bulk loading---taking huge time

Sangeetha.PattabiRaman2 at cognizant.com Sangeetha.PattabiRaman2 at cognizant.com
Mon May 14 01:33:43 EDT 2012



From: Pattabi Raman, Sangeetha (Cognizant)
Sent: Thursday, May 10, 2012 3:25 PM
To: riak-users at lists.basho.com
Subject: issue on riak bulk loading---taking huge time

Dear team,

FYI:we have a 4 quad core intel processor on each server on 2 node cluster with more than 1 TB of storage
I Ihave constructed the  2 node physical machine riak  cluster with n_val 2 and my app.config ,vm.args are attached for your reference..

Please tell me where the bulk inserted data onto riak db gets stored on Local file system...its taking  huge time to load small size itself...how to tune it to perform to large scale since we deal wit hbigdata of in few hungred GB's?????????????????

Cmd used:time ./load_data1m Customercalls1m.csv

./load_data100m CustomerCalls100m(got this error so changed default config of app.config...from 8 MB to 3072 MB
escript: exception error: no match of right hand side value {error,enoent}


size

Load time

No of mappersonapp.config

Js-max-vm-mem on app.config

Js-thread-stack

100k(10,lakhrows)-5 MB

20m39.625 seconds

48

3 GB 3072MB(changedfromdefault 8MB)since i/p data is large)

3 GB 3072MB(changedfromdefault 8MB)since i/p data is large)

1millionrows---54 MB

198m42.375seconds

48

3 GB 3072MB(changedfromdefault 8MB)since i/p data is large)

3 GB 3072MB(changedfromdefault 8MB)since i/p data is large)

.


./load_data script used:

#!/usr/local/bin/escript
main([Filename]) ->
    {ok, Data} = file:read_file(Filename),
    Lines = tl(re:split(Data, "\r?\n", [{return, binary},trim])),
    lists:foreach(fun(L) -> LS = re:split(L, ","), format_and_insert(LS) end, Lines).

format_and_insert(Line) ->
    JSON = io_lib:format("{\"id\":\"~s\",\"phonenumber\":~s,\"callednumber\":~s,\"starttime\":~s,\"endtime\":~s,\"status\":~s}", Line),
    Command = io_lib:format("curl -X PUT http://10.232.5.169:8098/riak/CustomerCalls100k/~s -d '~s' -H 'content-type: application/json'", [hd(Line),JSON]),
    io:format("Inserting: ~s~n", [hd(Line)]),
    os:cmd(Command).



Thanks in advance!!!!!!!!!!waiting fr  the reply...plz anyone help..struck u pwit hbulk loading.....and make me clear how riak splits the data and gets loaded on cluster
Thanks & regards
sangeetha


This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120514/9036aa49/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: app.config.txt
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120514/9036aa49/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: vm.args.txt
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120514/9036aa49/attachment-0001.txt>


More information about the riak-users mailing list