Bitcask & Innostore Benchmark @ EC2

James Sadler freshtonic at gmail.com
Mon Jul 12 09:28:51 EDT 2010


Hi Sean,

So the mental model I have of the benchmark framework is wrong.

I thought that 'concurrent' was the setting for how many requests
could be in progress at any one time (I assumed 1 erlang process
(worker) per request), which is essentially a total deliberate
overestimate of our eventual production concurrency levels; not
particularly scientific and to be taken with a pinch of salt, but it
was consistent across a number of benchmarking runs.

With my mental model of what concurrent meant, I just took mode 'max'
to mean 'don't bother rate throttling'.

Anyway, even if I did hammer the server too hard, BitCask seemed to
handle it just fine!

Regarding the worker processes, does each erlang process handle > 1
request at a time?

Thanks,

James


On 12 July 2010 23:03, Sean Cribbs <sean at basho.com> wrote:
> James,
>
> One thing you might try is lowering the concurrency rate. "max" mode at even 5-10 workers is enough to saturate most networks (the most I've ever run has been 15).  Whether that is an accurate representation of your production load is another story entirely, and an "exercise for the reader".
>
> Sean Cribbs <sean at basho.com>
> Developer Advocate
> Basho Technologies, Inc.
> http://basho.com/
>
> On Jul 11, 2010, at 11:52 PM, James Sadler wrote:
>
>> Hi All,
>>
>> I've been benchmarking Riak using basho_bench on an EC2 m1.large
>> instance and also running locally on my iMac inside VirtualBox to
>> assess performance of the Bitcask and Innostore backends
>>
>> The test configuration looks like this:
>>
>> {mode, max}.
>> {duration, 10}.
>> {concurrent, 50}.
>> {driver, basho_bench_driver_http_raw}.
>> {code_paths, ["deps/stats",
>>              "deps/ibrowse"]}.
>> %% a composite key composed of 4 IDs, each of which is a 16 char hex string,
>> %% specific to our data model.
>> {key_generator, {random_dynamo_style_string, 35000}}.
>> {value_generator, {fixed_bin, 1000}}.
>> {operations, [{get, 1}, {update, 2}]}.
>> {http_raw_ips, ["127.0.0.1"]}.
>>
>> Also, to generate the 'random_dynamo_style_string', I made the
>> following changes to the bash_bench source:
>>
>> diff --git a/src/basho_bench_keygen.erl b/src/basho_bench_keygen.erl
>> index 4849bbe..639e90b 100644
>> --- a/src/basho_bench_keygen.erl
>> +++ b/src/basho_bench_keygen.erl
>> @@ -54,6 +54,13 @@ new({pareto_int, MaxKey}, _Id) ->
>> new({pareto_int_bin, MaxKey}, _Id) ->
>>     Pareto = pareto(trunc(MaxKey * 0.2), ?PARETO_SHAPE),
>>     fun() -> <<(Pareto()):32/native>> end;
>> +new({random_dynamo_style_string, MaxKey}, _Id) ->
>> +    fun() -> lists:concat([
>> +                    get_random_string(16, "0123456789abcdef"), "-",
>> +                    get_random_string(16, "0123456789abcdef"), "-",
>> +                    get_random_string(16, "0123456789abcdef"), "-",
>> +                    get_random_string(16, "0123456789abcdef")])
>> +    end;
>> new(Other, _Id) ->
>>     ?FAIL_MSG("Unsupported key generator requested: ~p\n", [Other]).
>>
>> @@ -74,10 +81,17 @@ dimension({pareto_int, _}) ->
>>     0.0;
>> dimension({pareto_int_bin, _}) ->
>>     0.0;
>> +dimension({random_dynamo_style_string, MaxKey}) ->
>> +    0.0;
>> dimension(Other) ->
>>     ?FAIL_MSG("Unsupported key generator dimension requested: ~p\n", [Other]).
>>
>> -
>> +get_random_string(Length, AllowedChars) ->
>> +    lists:foldl(fun(_, Acc) ->
>> +                        [lists:nth(random:uniform(length(AllowedChars)),
>> +                                   AllowedChars)]
>> +                            ++ Acc
>> +                end, [], lists:seq(1, Length)).
>>
>>
>> %% ====================================================================
>>
>>
>> As of now, I've only been running the benchmark with a 'cluster' of
>> one single Riak node, and I have benchmarked with bitcask and
>> innostore backends on the latest version of Riak (0.11.0-1344) and
>> innostore (1.0.0-88) on Ubuntu Lucid.  I have also been running the
>> basho_bench on the same host as the Riak node.
>>
>> The benchmarks are showing very high get and update latencies in the
>> 95th percentile and beyond when using Innostore as the backend.
>> Bitcask performance is much better.
>>
>> While running the benchmarks, I had an iostat process reporting IO
>> every 1 second.  It clearly showed heavy writes during the benchmark,
>> but practically zero reads.  I expect that this was because of the
>> disk cache.  What I found very surprising was that the latencies for
>> innostore gets during the benchmark were very high, even though the
>> disk was not being hit for reads at all.  This was reproducible on
>> both EC2 and on a local VM on my iMac.
>>
>> Observations:
>>
>> ## Innostore backend
>>
>> - Latencies for innostore are high across the board. Even for reads,
>> __when iostat is reporting that no reads are hitting the disk__.
>>
>> - 95th percentile read/write latencies are up to 1500/2000 millis.
>>
>> - 99th percentile reads/writes are up to 2000/4000 millis.
>>
>> - The difference between update and get latency is small.
>>
>> - There are some failed (timeouts) updates/gets in log file produced
>> by basho_bench
>>
>> - Throughput with innostore backend is 150-200 req/sec
>>
>> - Mounting the filesystem with noatime doesn't seem to make much of a
>> difference.
>>
>> - Getting values from disk cache has huge latency (iostat reporting no
>> reads on the device).  This is somewhat bizarre.
>>
>> ## Bitcask backend
>>
>> - There are zero errors in the log produced by bash_bench (no timeouts
>> like with innostore)
>>
>> - Throughput is much higher: 620 req/sec
>>
>> - The 99.9th percentile latencies are 200ms for writes, and for reads 100ms
>>
>> - The 95th percentile latencies are 160ms for writes and 60ms for reads
>>
>> - Mean & median latencies are 115ms for writes and 20ms for reads.
>>
>> Summary charts from basho_bench are attached.
>>
>>
>> NOTE:
>>
>> I haven't included benchmark results from my own local VM.  FWIW, I
>> observed the approximately same characteristics in the EC2 and local
>> benchmarks.
>>
>> In summary, it looks like there are significant performance issues
>> with Innostore in terms of throughput and latency.  Latency is the
>> biggest issue for our ad serving product at Lexer, so it looks like
>> we'll be using Bitcask in production.
>>
>> Hopefully these results will be useful to others.
>>
>> Also, given our large key size of 67 characters, combined with the
>> Bitcask's padding and storage layout, how many keys should we be able
>> to manager per node per GB?
>>
>> Looking forward to any comments.
>>
>> Thanks.
>>
>> --
>> James
>> <summary_bitcask_ec2.png><summary_innostore_ec2.png>_______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>



-- 
James




More information about the riak-users mailing list