Throughput issue contd. On Joyend Riak Smartmachine

Reid Draper reiddraper at gmail.com
Wed Jun 27 09:41:36 EDT 2012


On Jun 27, 2012, at 8:41 AM, Yousuf Fauzan wrote:

> So I created an array of clients using the following code
> 
> Clients = [riak.RiakClient(e, port=8087, transport_class=riak.RiakPbcTransport) for e in NODES]

Sounds like you're bringing your concurrency back down to 3 (because you have three nodes). Give
something like 10 connections _per_ node a try, so 30 connections.

> 
> After this I assigned each thread a particular id ranging from 0 to Number of Nodes
> 
> So each thread now communicates with a single node.
> 
> Even after this, I am getting <100 writes/sec
> 
> 
> On Wed, Jun 27, 2012 at 5:35 PM, Yousuf Fauzan <yousuffauzan at gmail.com> wrote:
> Oh! I think that may be an issue with my code then.
> 
> Let me make some changes and get back to you.
> 
> 
> On Wed, Jun 27, 2012 at 5:25 PM, Reid Draper <reiddraper at gmail.com> wrote:
> 
> On Jun 27, 2012, at 7:48 AM, Yousuf Fauzan wrote:
> 
>> This is great.
>> 
>> I was loading data using Python. My code would spawn 10 threads and put data in a queue. All threads would read data from this queue.
>> However, all threads were hitting the same server/load balancer.
>> 
>> I tried a different setup too. Where I spawned processes with each process having its own queue. In this case too, all processes were hitting the same server.
>> 
>> I just now made a change to my code. So now I have 10 threads randomly selecting a node and storing data in it.
>> Again, I am getting around 50 writes/sec
> 
> When the threads randomly pick a node, do they create a new connection to it, or do they pull the connection from
> a pool? As you saw with the throughput difference between curl and python, persistent connections make
> big difference.
> 
>> 
>> Could there be something wrong with the way I have written my loader script?
>> 
>> On Wed, Jun 27, 2012 at 5:10 PM, Russell Brown <russell.brown at mac.com> wrote:
>> 
>> On 27 Jun 2012, at 12:36, Yousuf Fauzan wrote:
>> 
>>> So I changed concurrency to 10 and put all the IPs of the nodes in basho bench config.
>>> Throughput is now around 1500.
>>> 
>> 
>> I guess you can now try 5 or 15 concurrent workers and see which is optimal for that set up to get a good feel for the sizing of any connection pools for your application.
>> 
>> You can also see how adding nodes and adding workers effects your results to help you size the cluster you need for your expected usage.
>> 
>> Cheers
>> 
>> Russell
>> 
>>> 
>>> On Wed, Jun 27, 2012 at 4:40 PM, Russell Brown <russell.brown at mac.com> wrote:
>>> 
>>> On 27 Jun 2012, at 12:09, Yousuf Fauzan wrote:
>>> 
>>>> I used examples/riakc_pb.config
>>>> 
>>>> {mode, max}.
>>>> 
>>>> {duration, 10}.
>>>> 
>>>> {concurrent, 1}.
>>> 
>>> Try upping this. On my local 3 node cluster with 8gb ram and an old, cheap quad core per box I'd set concurrency to 10 workers.
>>> 
>>>> 
>>>> {driver, basho_bench_driver_riakc_pb}.
>>>> 
>>>> {key_generator, {int_to_bin, {uniform_int, 10000}}}.
>>>> 
>>>> {value_generator, {fixed_bin, 10000}}.
>>>> 
>>>> {riakc_pb_ips, [{<IP of one of the nodes>}]}.
>>> 
>>> I add all the IPs here, one entry per node.
>>> 
>>>> 
>>>> {riakc_pb_replies, 1}.
>>>> 
>>>> {operations, [{get, 1}, {update, 1}]}.
>>>> 
>>>> 
>>>> On Wed, Jun 27, 2012 at 4:37 PM, Russell Brown <russell.brown at mac.com> wrote:
>>>> 
>>>> On 27 Jun 2012, at 12:05, Yousuf Fauzan wrote:
>>>> 
>>>>> I did use basho bench on my clusters. It should throughput of around 150
>>>> 
>>>> Could you share the config you used, please?
>>>> 
>>>>> 
>>>>> On Wed, Jun 27, 2012 at 4:24 PM, Russell Brown <russell.brown at mac.com> wrote:
>>>>> 
>>>>> On 27 Jun 2012, at 11:50, Yousuf Fauzan wrote:
>>>>> 
>>>>>> Its not about the difference in throughput in the two approaches I took. Rather, the issue is that even 200 writes/sec is a bit on the lower side.
>>>>>> I could be doing something wrong with the configuration because people are reporting throughputs of 2-3k ops/sec
>>>>>> 
>>>>>> If anyone here could guide me in setting up a cluster which would give such kind of throughput.
>>>>> 
>>>>> To get the kind of throughput I use multiple threads / workers. Have you looked at basho_bench[1], it is a simple, reliable tool to benchmark Riak clusters?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Russell
>>>>> 
>>>>> [1] Basho Bench - https://github.com/basho/basho_bench and http://wiki.basho.com/Benchmarking.html
>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>> Yousuf
>>>>>> 
>>>>>> On Wed, Jun 27, 2012 at 4:02 PM, Eric Anderson <anderson at copperegg.com> wrote:
>>>>>> On Jun 27, 2012, at 5:13 AM, Yousuf Fauzan <yousuffauzan at gmail.com> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I setup a 3 machine riak SM cluster. Each machine used 4GB Ram and riak OpenSource SmartMachine Image.
>>>>>>> 
>>>>>>> Afterwards I tried loading data by following two methods
>>>>>>> 1. Bash script
>>>>>>> #!/bin/bash
>>>>>>> echo $(date)
>>>>>>> for (( c=1; c<=1000; c++ ))
>>>>>>> do
>>>>>>> 	curl -s -d 'this is a test' -H "Content-Type: text/plain" http://127.0.0.1:8098/buckets/test/keys
>>>>>>> done
>>>>>>> echo $(date)
>>>>>>> 
>>>>>>> 2. Python Riak Client
>>>>>>> c=riak.RiakClient("10.112.2.185") 
>>>>>>> b=c.bucket("test")
>>>>>>> for i in xrange(10000):o=b.new(str(i), str(i)).store()
>>>>>>> 
>>>>>>> For case 1, throughput was 25 writes/sec
>>>>>>> For case 2, throughput was 200 writes/sec
>>>>>>> 
>>>>>>> Maybe I am making a fundamental mistake somewhere. I tried the above two scripts on EC2 clusters too and still got the same performance.
>>>>>>> 
>>>>>>> Please, someone help
>>>>>> 
>>>>>> 
>>>>>> The major difference between these two is the first is executing a binary, which has to basically create everything (connection, payload, etc) every time through the loop.  The second does not - it creates the client once, then iterates over it keeping the same client and presumably the same connection as well.  That makes a huge difference.
>>>>>> 
>>>>>> I would not use curl to do performance testing.  What you probably want is something like your python script that will work on many threads/processes at once (or fire them up many times).
>>>>>> 
>>>>>> 
>>>>>> Eric Anderson
>>>>>> Co-Founder
>>>>>> CopperEgg
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120627/2d945227/attachment.html>


More information about the riak-users mailing list