Throughput issue contd. On Joyend Riak Smartmachine

Yousuf Fauzan yousuffauzan at gmail.com
Wed Jun 27 08:05:11 EDT 2012


Oh! I think that may be an issue with my code then.

Let me make some changes and get back to you.

On Wed, Jun 27, 2012 at 5:25 PM, Reid Draper <reiddraper at gmail.com> wrote:

>
> On Jun 27, 2012, at 7:48 AM, Yousuf Fauzan wrote:
>
> This is great.
>
> I was loading data using Python. My code would spawn 10 threads and put
> data in a queue. All threads would read data from this queue.
> However, all threads were hitting the same server/load balancer.
>
> I tried a different setup too. Where I spawned processes with each process
> having its own queue. In this case too, all processes were hitting the same
> server.
>
> I just now made a change to my code. So now I have 10 threads randomly
> selecting a node and storing data in it.
> Again, I am getting around 50 writes/sec
>
>
> When the threads randomly pick a node, do they create a new connection to
> it, or do they pull the connection from
> a pool? As you saw with the throughput difference between curl and python,
> persistent connections make
> big difference.
>
>
> Could there be something wrong with the way I have written my loader
> script?
>
> On Wed, Jun 27, 2012 at 5:10 PM, Russell Brown <russell.brown at mac.com>wrote:
>
>>
>> On 27 Jun 2012, at 12:36, Yousuf Fauzan wrote:
>>
>> So I changed concurrency to 10 and put all the IPs of the nodes in basho
>> bench config.
>> Throughput is now around 1500.
>>
>>
>> I guess you can now try 5 or 15 concurrent workers and see which is
>> optimal for that set up to get a good feel for the sizing of any connection
>> pools for your application.
>>
>> You can also see how adding nodes and adding workers effects your results
>> to help you size the cluster you need for your expected usage.
>>
>> Cheers
>>
>> Russell
>>
>>
>> On Wed, Jun 27, 2012 at 4:40 PM, Russell Brown <russell.brown at mac.com>wrote:
>>
>>>
>>> On 27 Jun 2012, at 12:09, Yousuf Fauzan wrote:
>>>
>>> I used examples/riakc_pb.config
>>>
>>> {mode, max}.
>>>
>>> {duration, 10}.
>>>
>>> {concurrent, 1}.
>>>
>>>
>>> Try upping this. On my local 3 node cluster with 8gb ram and an old,
>>> cheap quad core per box I'd set concurrency to 10 workers.
>>>
>>>
>>> {driver, basho_bench_driver_riakc_pb}.
>>>
>>> {key_generator, {int_to_bin, {uniform_int, 10000}}}.
>>>
>>> {value_generator, {fixed_bin, 10000}}.
>>>
>>> {riakc_pb_ips, [{<IP of one of the nodes>}]}.
>>>
>>>
>>> I add all the IPs here, one entry per node.
>>>
>>>
>>> {riakc_pb_replies, 1}.
>>>
>>> {operations, [{get, 1}, {update, 1}]}.
>>>
>>>
>>> On Wed, Jun 27, 2012 at 4:37 PM, Russell Brown <russell.brown at mac.com>wrote:
>>>
>>>>
>>>> On 27 Jun 2012, at 12:05, Yousuf Fauzan wrote:
>>>>
>>>> I did use basho bench on my clusters. It should throughput of around 150
>>>>
>>>>
>>>> Could you share the config you used, please?
>>>>
>>>>
>>>> On Wed, Jun 27, 2012 at 4:24 PM, Russell Brown <russell.brown at mac.com>wrote:
>>>>
>>>>>
>>>>> On 27 Jun 2012, at 11:50, Yousuf Fauzan wrote:
>>>>>
>>>>> Its not about the difference in throughput in the two approaches I
>>>>> took. Rather, the issue is that even 200 writes/sec is a bit on the lower
>>>>> side.
>>>>> I could be doing something wrong with the configuration because people
>>>>> are reporting throughputs of 2-3k ops/sec
>>>>>
>>>>> If anyone here could guide me in setting up a cluster which would give
>>>>> such kind of throughput.
>>>>>
>>>>>
>>>>> To get the kind of throughput I use multiple threads / workers. Have
>>>>> you looked at basho_bench[1], it is a simple, reliable tool to benchmark
>>>>> Riak clusters?
>>>>>
>>>>> Cheers
>>>>>
>>>>> Russell
>>>>>
>>>>> [1] Basho Bench - https://github.com/basho/basho_bench and
>>>>> http://wiki.basho.com/Benchmarking.html
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Yousuf
>>>>>
>>>>> On Wed, Jun 27, 2012 at 4:02 PM, Eric Anderson <anderson at copperegg.com
>>>>> > wrote:
>>>>>
>>>>>> On Jun 27, 2012, at 5:13 AM, Yousuf Fauzan <yousuffauzan at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I setup a 3 machine riak SM cluster. Each machine used 4GB Ram and
>>>>>> riak OpenSource SmartMachine Image.
>>>>>>
>>>>>> Afterwards I tried loading data by following two methods
>>>>>> 1. Bash script
>>>>>> #!/bin/bash
>>>>>> echo $(date)
>>>>>> for (( c=1; c<=1000; c++ ))
>>>>>> do
>>>>>> curl -s -d 'this is a test' -H "Content-Type: text/plain"
>>>>>> http://127.0.0.1:8098/buckets/test/keys
>>>>>> done
>>>>>> echo $(date)
>>>>>>
>>>>>> 2. Python Riak Client
>>>>>> c=riak.RiakClient("10.112.2.185")
>>>>>> b=c.bucket("test")
>>>>>> for i in xrange(10000):o=b.new(str(i), str(i)).store()
>>>>>>
>>>>>> For case 1, throughput was 25 writes/sec
>>>>>> For case 2, throughput was 200 writes/sec
>>>>>>
>>>>>> Maybe I am making a fundamental mistake somewhere. I tried the above
>>>>>> two scripts on EC2 clusters too and still got the same performance.
>>>>>>
>>>>>> Please, someone help
>>>>>>
>>>>>>
>>>>>>
>>>>>> The major difference between these two is the first is executing a
>>>>>> binary, which has to basically create everything (connection, payload, etc)
>>>>>> every time through the loop.  The second does not - it creates the client
>>>>>> once, then iterates over it keeping the same client and presumably the same
>>>>>> connection as well.  That makes a huge difference.
>>>>>>
>>>>>> I would not use curl to do performance testing.  What you probably
>>>>>> want is something like your python script that will work on many
>>>>>> threads/processes at once (or fire them up many times).
>>>>>>
>>>>>>
>>>>>> Eric Anderson
>>>>>> Co-Founder
>>>>>> CopperEgg
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120627/842edf83/attachment.html>


More information about the riak-users mailing list