Riak 1.4 test on Azure - Webmachine error at path ...

Matthew Von-Maszewski matthewv at basho.com
Sun Jul 28 16:08:13 EDT 2013


Christian,

leveldb has two independent caches:  file cache and data block cache.  You have raised the data block cache from its default 8M to 256M per your earlier note.  I would recommend the follow:

{max_open_files, 50},          %% 50 * 4Mbytes allocation for file cache
{cache_size, 104857600},  %% 100Mbytes for data block cache

The max_open_files default is 20 (which is internally reduced by 10).  You are likely thrashing file opens.  The file cache is far more important to performance than the data block cache.  

Find the LOG file within one of your database "vnode" directories.  Look for a line like this ' compacted to: files[ 0 9 25 14 2 0 0 ]'.  You would like to be covering that total count of files (plus 10) with your max_open_files setting.  Take the cache_size down to as low as 8Mbytes to achieve the coverage.  Once you are down to 8Mbytes of cache_size, you should go no lower and give up on full max_open_files coverage.

Summary:  total memory per vnode in 1.4 is (max_open_files - 10) * 4Mbytes + cache_size;



Matthew

On Jul 28, 2013, at 3:53 PM, Christian Rosnes <christian.rosnes at gmail.com> wrote:

> 
> 
> 
> On Thu, Jul 25, 2013 at 2:16 PM, Christian Rosnes <christian.rosnes at gmail.com> wrote:
>  
> During a test I just performed on a small Riak 1.4 cluster setup on Azure,
> I started seeing the Riak errors messages listed below after about 10 minutes. 
> 
> The simple test was performed using lastest Jmeter running on two Azure instances, 
> which also each runs haproxy and loadbalances the http/rest requests between 
> the 4 Riak nodes.
> 
> 
> An Update:
> 
> Increased some sysctl.conf network parameters on the Riak instances.
> I have now been running 3 consecutive 1-hour JMeter tests with no errors:
> 
> >> 600 JMeter threads - 3600 seconds
> [FINAL RESULTS] 
> total count: 5526844, overall avg: 386 (ms), 
> overall tps: 1536.6 (p/sec), recent tps: 1932.1 (p/sec), errors: 0
> 
> >> 400 JMeter threads - 3600 seconds
> [FINAL RESULTS] 
> total count: 6350600, overall avg: 224 (ms), 
> overall tps: 1765.7 (p/sec), recent tps: 1849.1 (p/sec), errors: 0
> 
> >> 300 JMeter threads - 3600 seconds
> [FINAL RESULTS]
> total count: 5997689, overall avg: 178 (ms), 
> overall tps: 1666.5 (p/sec), recent tps: 1744.7 (p/sec), errors: 0
> 
> A drop in performance compared to previous Azure results 
> (from around 2000 req/s for 1-hour tests), but it may be caused
> by the move of Riak data directory from ephemeral 
> /mnt/resource partitions to persistent XFS partitions; 
> 'iostat' reports that the new partitions are near 100% 
> io utilization during the tests.
> 
> Will run a few more tests, then on to testing on some larger 
> (and a few more) Azure instances and compare result with 
> similar instances on AWS.
> 
> Christian
> @NorSoulx
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130728/4869b67a/attachment.html>


More information about the riak-users mailing list