Throughput issue contd. On Joyend Riak Smartmachine

Yousuf Fauzan yousuffauzan at gmail.com
Wed Jun 27 08:41:13 EDT 2012


So I created an array of clients using the following code

Clients = [riak.RiakClient(e, port=8087,
transport_class=riak.RiakPbcTransport) for e in NODES]

After this I assigned each thread a particular id ranging from 0 to Number
of Nodes

So each thread now communicates with a single node.

Even after this, I am getting <100 writes/sec


On Wed, Jun 27, 2012 at 5:35 PM, Yousuf Fauzan <yousuffauzan at gmail.com>wrote:

> Oh! I think that may be an issue with my code then.
>
> Let me make some changes and get back to you.
>
>
> On Wed, Jun 27, 2012 at 5:25 PM, Reid Draper <reiddraper at gmail.com> wrote:
>
>>
>> On Jun 27, 2012, at 7:48 AM, Yousuf Fauzan wrote:
>>
>> This is great.
>>
>> I was loading data using Python. My code would spawn 10 threads and put
>> data in a queue. All threads would read data from this queue.
>> However, all threads were hitting the same server/load balancer.
>>
>> I tried a different setup too. Where I spawned processes with each
>> process having its own queue. In this case too, all processes were hitting
>> the same server.
>>
>> I just now made a change to my code. So now I have 10 threads randomly
>> selecting a node and storing data in it.
>> Again, I am getting around 50 writes/sec
>>
>>
>> When the threads randomly pick a node, do they create a new connection to
>> it, or do they pull the connection from
>> a pool? As you saw with the throughput difference between curl and
>> python, persistent connections make
>> big difference.
>>
>>
>> Could there be something wrong with the way I have written my loader
>> script?
>>
>> On Wed, Jun 27, 2012 at 5:10 PM, Russell Brown <russell.brown at mac.com>wrote:
>>
>>>
>>> On 27 Jun 2012, at 12:36, Yousuf Fauzan wrote:
>>>
>>> So I changed concurrency to 10 and put all the IPs of the nodes in basho
>>> bench config.
>>> Throughput is now around 1500.
>>>
>>>
>>> I guess you can now try 5 or 15 concurrent workers and see which is
>>> optimal for that set up to get a good feel for the sizing of any connection
>>> pools for your application.
>>>
>>> You can also see how adding nodes and adding workers effects your
>>> results to help you size the cluster you need for your expected usage.
>>>
>>> Cheers
>>>
>>> Russell
>>>
>>>
>>> On Wed, Jun 27, 2012 at 4:40 PM, Russell Brown <russell.brown at mac.com>wrote:
>>>
>>>>
>>>> On 27 Jun 2012, at 12:09, Yousuf Fauzan wrote:
>>>>
>>>> I used examples/riakc_pb.config
>>>>
>>>> {mode, max}.
>>>>
>>>> {duration, 10}.
>>>>
>>>> {concurrent, 1}.
>>>>
>>>>
>>>> Try upping this. On my local 3 node cluster with 8gb ram and an old,
>>>> cheap quad core per box I'd set concurrency to 10 workers.
>>>>
>>>>
>>>> {driver, basho_bench_driver_riakc_pb}.
>>>>
>>>> {key_generator, {int_to_bin, {uniform_int, 10000}}}.
>>>>
>>>> {value_generator, {fixed_bin, 10000}}.
>>>>
>>>> {riakc_pb_ips, [{<IP of one of the nodes>}]}.
>>>>
>>>>
>>>> I add all the IPs here, one entry per node.
>>>>
>>>>
>>>> {riakc_pb_replies, 1}.
>>>>
>>>> {operations, [{get, 1}, {update, 1}]}.
>>>>
>>>>
>>>> On Wed, Jun 27, 2012 at 4:37 PM, Russell Brown <russell.brown at mac.com>wrote:
>>>>
>>>>>
>>>>> On 27 Jun 2012, at 12:05, Yousuf Fauzan wrote:
>>>>>
>>>>> I did use basho bench on my clusters. It should throughput of around
>>>>> 150
>>>>>
>>>>>
>>>>> Could you share the config you used, please?
>>>>>
>>>>>
>>>>> On Wed, Jun 27, 2012 at 4:24 PM, Russell Brown <russell.brown at mac.com>wrote:
>>>>>
>>>>>>
>>>>>> On 27 Jun 2012, at 11:50, Yousuf Fauzan wrote:
>>>>>>
>>>>>> Its not about the difference in throughput in the two approaches I
>>>>>> took. Rather, the issue is that even 200 writes/sec is a bit on the lower
>>>>>> side.
>>>>>> I could be doing something wrong with the configuration because
>>>>>> people are reporting throughputs of 2-3k ops/sec
>>>>>>
>>>>>> If anyone here could guide me in setting up a cluster which would
>>>>>> give such kind of throughput.
>>>>>>
>>>>>>
>>>>>> To get the kind of throughput I use multiple threads / workers. Have
>>>>>> you looked at basho_bench[1], it is a simple, reliable tool to benchmark
>>>>>> Riak clusters?
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> Russell
>>>>>>
>>>>>> [1] Basho Bench - https://github.com/basho/basho_bench and
>>>>>> http://wiki.basho.com/Benchmarking.html
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Yousuf
>>>>>>
>>>>>> On Wed, Jun 27, 2012 at 4:02 PM, Eric Anderson <
>>>>>> anderson at copperegg.com> wrote:
>>>>>>
>>>>>>> On Jun 27, 2012, at 5:13 AM, Yousuf Fauzan <yousuffauzan at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I setup a 3 machine riak SM cluster. Each machine used 4GB Ram and
>>>>>>> riak OpenSource SmartMachine Image.
>>>>>>>
>>>>>>> Afterwards I tried loading data by following two methods
>>>>>>> 1. Bash script
>>>>>>> #!/bin/bash
>>>>>>> echo $(date)
>>>>>>> for (( c=1; c<=1000; c++ ))
>>>>>>> do
>>>>>>> curl -s -d 'this is a test' -H "Content-Type: text/plain"
>>>>>>> http://127.0.0.1:8098/buckets/test/keys
>>>>>>> done
>>>>>>> echo $(date)
>>>>>>>
>>>>>>> 2. Python Riak Client
>>>>>>> c=riak.RiakClient("10.112.2.185")
>>>>>>> b=c.bucket("test")
>>>>>>> for i in xrange(10000):o=b.new(str(i), str(i)).store()
>>>>>>>
>>>>>>> For case 1, throughput was 25 writes/sec
>>>>>>> For case 2, throughput was 200 writes/sec
>>>>>>>
>>>>>>> Maybe I am making a fundamental mistake somewhere. I tried the above
>>>>>>> two scripts on EC2 clusters too and still got the same performance.
>>>>>>>
>>>>>>> Please, someone help
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The major difference between these two is the first is executing a
>>>>>>> binary, which has to basically create everything (connection, payload, etc)
>>>>>>> every time through the loop.  The second does not - it creates the client
>>>>>>> once, then iterates over it keeping the same client and presumably the same
>>>>>>> connection as well.  That makes a huge difference.
>>>>>>>
>>>>>>> I would not use curl to do performance testing.  What you probably
>>>>>>> want is something like your python script that will work on many
>>>>>>> threads/processes at once (or fire them up many times).
>>>>>>>
>>>>>>>
>>>>>>> Eric Anderson
>>>>>>> Co-Founder
>>>>>>> CopperEgg
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120627/c3558a04/attachment.html>


More information about the riak-users mailing list