Memory usage

Brian Sparrow bsparrow at basho.com
Fri Feb 13 12:41:27 EST 2015


Hello Edgar,

The consistent handoff behavior is normally indicative of a network issue
which is resulting in frequent fallback vnodes starts. Based on your
previous messages, you are handing off quite a few vnodes with 1 object so
the vnodes are not long lived. Additionally, the most recent errors
indicate a TCP recv timeout, further indicating some issue at the network
layer.

I'd be happy to investigate this issue with you. Please attach a
`riak-debug` output from this node and at least one other node in the
cluster so we can track the issue down.

Thanks,
Brian

On Fri, Feb 13, 2015 at 5:40 AM, Edgar Veiga <edgarmveiga at gmail.com> wrote:

> Hi again everyone!
>
> - The memory usage keeps growing day by day:
> https://dl.dropboxusercontent.com/u/1962284/riak2.png
>
> - The handoffs keep on going, with strange things like a transfer started
> 1.5 days ago:
> riak-admin transfers
> 'riak at 192.168.20.112' waiting to handoff 51 partitions
> 'riak at 192.168.20.111' waiting to handoff 74 partitions
> 'riak at 192.168.20.110' waiting to handoff 86 partitions
> 'riak at 192.168.20.109' waiting to handoff 191 partitions
> 'riak at 192.168.20.108' waiting to handoff 67 partitions
> 'riak at 192.168.20.107' waiting to handoff 177 partitions
>
> transfer type: hinted_handoff
> vnode type: riak_kv_vnode
> partition: 51380916937414555718098294900181824909778878464
> started: 2015-02-11 21:54:07 [1.53 d ago]
> last update: no updates seen
> total size: unknown
> objects transferred: unknown
>
> - I'm starting to have some entries in the error log:
> 2015-02-12 19:58:54.026 [error]
> <0.184.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff of
> partition riak_kv_vnode 936274486415109681974235595958868809467081785344
> was terminated for reason: noproc
> 2015-02-12 20:27:34.092 [error]
> <0.21096.1867>@riak_core_handoff_sender:start_fold:263 hinted_handoff
> transfer of riak_kv_vnode from 'riak at 192.168.20.112'
> 1210306043414653979137426502093171875652569137152 to 'riak at 192.168.20.109'
> 1210306043414653979137426502093171875652569137152 failed because of TCP
> recv timeout
> 2015-02-12 20:27:34.092 [error]
> <0.184.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff of
> partition riak_kv_vnode 1210306043414653979137426502093171875652569137152
> was terminated for reason: {shutdown,timeout}
> 2015-02-12 21:25:32.852 [error]
> <0.184.0>@riak_core_handoff_manager:handle_info:289 An outbound handoff of
> partition riak_kv_vnode 742168800207099138150308704113737470919028244480
> was terminated for reason: noproc
>
>
> Please, can anyone give me a help on this? I'm starting to get worried
> with this behaviour. Tell me if you need more info!
>
> Thanks and Best regards,
> Edgar Veiga
>
> On 10 February 2015 at 16:16, Edgar Veiga <edgarmveiga at gmail.com> wrote:
>
>> Hi all!
>>
>> I have a riak cluster, working smoothly in production for about one year, with the following characteristics:
>>
>> - Version 1.4.12
>>
>> - 6 nodes
>>
>> - leveldb backend
>>
>> - replication (n) = 3
>>
>> ~ 3 billion keys
>>
>> ~ 1.2Tb per node
>>
>> - AAE disabled
>>
>>
>> Two days ago I've upgraded all of the 6 nodes from riak v1.4.8 to v1.4.12, and two things started happening that are a little bit odd
>>
>> 1) The first is the memory consumption, please check the next imagem to understand what I mean:
>>
>> - https://dl.dropboxusercontent.com/u/1962284/riak.png
>>
>> 2) All of the machines keep logging hinted handoffs after the rolling restart. I've made the upgrade on non-busy hours and assured that the rolling restart was concluded only when all the in-progress handoffs were concluded, but on the next day when checking the logs I've realised that they keep appearing... Heres are some random examples:
>>
>> 2015-02-10 16:11:55.547 [info] <0.3070.753>@riak_core_handoff_sender:start_fold:148 Starting hinted_handoff transfer of riak_kv_vnode from 'riak at 192.168.20.112' 765004763290394496247241279624929393101152190464 to 'riak at 192.168.20.109' 765004763290394496247241279624929393101152190464
>>
>> 2015-02-10 16:11:55.548 [info] <0.3070.753>@riak_core_handoff_sender:start_fold:236 hinted_handoff transfer of riak_kv_vnode from 'riak at 192.168.20.112' 765004763290394496247241279624929393101152190464 to 'riak at 192.168.20.109' 765004763290394496247241279624929393101152190464 completed: sent 3.15 KB bytes in 1 of 1 objects in 0.00 seconds (3.99 MB/second)
>>
>> 2015-02-10 16:12:05.803 [info] <0.3434.753>@riak_core_handoff_sender:start_fold:148 Starting hinted_handoff transfer of riak_kv_vnode from 'riak at 192.168.20.112' 902020541790166644828836732692080926193895866368 to 'riak at 192.168.20.109' 902020541790166644828836732692080926193895866368
>>
>> 2015-02-10 16:12:05.856 [info] <0.3368.753>@riak_core_handoff_sender:start_fold:148 Starting hinted_handoff transfer of riak_kv_vnode from 'riak at 192.168.20.112' 570899077082383952423314387779798054553098649600 to 'riak at 192.168.20.111' 570899077082383952423314387779798054553098649600
>>
>> 2015-02-10 16:12:05.860 [info] <0.3434.753>@riak_core_handoff_sender:start_fold:236 hinted_handoff transfer of riak_kv_vnode from 'riak at 192.168.20.112' 902020541790166644828836732692080926193895866368 to 'riak at 192.168.20.109' 902020541790166644828836732692080926193895866368 completed: sent 39.79 KB bytes in 1 of 1 objects in 0.06 seconds (699.32 KB/second)
>>
>> 2015-02-10 16:12:05.886 [info] <0.3368.753>@riak_core_handoff_sender:start_fold:236 hinted_handoff transfer of riak_kv_vnode from 'riak at 192.168.20.112' 570899077082383952423314387779798054553098649600 to 'riak at 192.168.20.111' 570899077082383952423314387779798054553098649600 completed: sent 3.55 KB bytes in 1 of 1 objects in 0.03 seconds (118.58 KB/second)
>>
>>
>> Should I be worried or is this normal on this version?
>>
>>
>> Best regards,
>>
>> Edgar
>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20150213/5110f637/attachment-0002.html>


More information about the riak-users mailing list