Riak stalls, leveldb backend
andy at embed.ly
Mon Jan 5 12:07:26 EST 2015
I've been experiencing stalls where riak won't return any data (queries
time out) with my riak cluster. Here are some basic details:
- 8 nodes
- riak 1.4.10 (upgraded from 1.4.6 -> 1.4.8 -> 1.4.10)
- leveldb backend
- n_val is 2
- allow_mult is false
- ec2 i2.2xlarge boxes (8 cores, 61gb ram, 800gb disk space)
- about 33% disk space utilization per node
The riak cluster will stall for as long as a few minutes at a time, but
will otherwise work as expected for hours. There doesn't seem to be an
obvious pattern as to when the stalls happen.
My first thought was that the stalls may be related to AAE, but I've
disabled that via 'riak attach' and the settings file. Sidenote, I still
see messages like:
2015-01-05 12:24:04.666 [info]
AAE throttle from 0 -> 10 msec/key, based on maximum vnode mailbox size 209
from 'riak-user at riak-host'
which makes me question whether AAE is actually turned off.
Now I'm leaning towards leveldb compactions being the issue. What can I do
to verify this is the issue, and how can I fix it?
I see log messages about large objects:
2015-01-05 16:11:28.046 [warning]
<0.6398.0>@riak_kv_vnode:encode_and_put_no_sib_check:1830 Writing very
large object (11307735 bytes) to <<"BucketName">>/<<"keys_1420466400">>
Could these be causing longer-running compactions, or more frequent
Thanks for reading,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the riak-users