Memory-backend TTL

Lucas Grijander lucasgrinjander69 at gmail.com
Mon Oct 20 04:56:46 EDT 2014


Hi Luke,

Indeed, when removed the thousands of requests, the memory is stabilized.
However the memory consumption is still very high:

riak-admin status |grep memory
memory_total : 18494760128
memory_processes : 145363184
memory_processes_used : 142886424
memory_system : 18349396944
memory_atom : 561761
memory_atom_used : 554496
memory_binary : 7108243240
memory_code : 13917820
memory_ets : 11200328880

I have test also with Riak 1.4.10 and the performance is the same.

Is it normal that the "memory_ets" has more than 10GB when we have a
"ring_size" of 16 and a max_memory_per_vnode = 250MB?

2014-10-15 20:50 GMT+02:00 Lucas Grijander <lucasgrinjander69 at gmail.com>:

> Hi Luke.
>
> About the first issue:
>
> - From the beginning, the servers are all running ntpd. They are Ubuntu
> 14.04 and the ntpd service is installed and running by default.
> - Anti-entropy was also disabled from the beginning:
>
> {anti_entropy,{off,[]}},
>
>
> About the second issue, I am perplex because, after 2 restarts of the Riak
> server, just now there is a big memory consumption but is not growing like
> the previous days. The only change was to remove this code (it was used
> thousands of times/s). It was a possible workaround about the previous
> problem with the TTL but this code now is useless because the TTL is
> working fine with this node alone:
>
> self.db.delete((key)
> self.db.get(key, r=1)
>
>
> # riak-admin status|grep memory
> memory_total : 18617871264
> memory_processes : 224480232
> memory_processes_used : 222700176
> memory_system : 18393391032
> memory_atom : 561761
> memory_atom_used : 552862
> memory_binary : 7135206080
> memory_code : 13779729
> memory_ets : 11209256232
>
> The problem is that I don't remember if the code change was after or
> before the second restart. I am going to restart the riak server again and
> I will report you about if the "possible memory leak" is reproduced.
>
> This is the props of the bucket:
>
> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":1,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":1,"rw":"quorum","small_vclock":50,"w":1,"young_vclock":20}}
>
> About the data that we put into the bucket are all with this schema:
>
> KEY: Alphanumeric with a length of 47
> DATA: Long integer.
>
> # riak-admin status|grep puts
> vnode_puts : 84708
> vnode_puts_total : 123127430
> node_puts : 83169
> node_puts_total : 123128062
>
> # riak-admin status|grep gets
> vnode_gets : 162314
> vnode_gets_total : 240433213
> node_gets : 162317
> node_gets_total : 240433216
>
> 2014-10-14 16:26 GMT+02:00 Luke Bakken <lbakken at basho.com>:
>
>> Hi Lucas,
>>
>> With regard to the mysterious key deletion / resurrection, please do
>> the following:
>>
>> * Ensure your servers are all running ntpd and have their time
>> synchronized as closely as possible.
>> * Disable anti-entropy. I suspect this is causing the strange behavior
>> you're seeing with keys.
>>
>> Your single node cluster memory consumption issue is a bit of a
>> puzzler. I'm assuming you're using default bucket settings and not
>> using bucket types based on your previous emails, and that allow_mult
>> is still false for your ttl_stg bucket. Can you tell me more about the
>> data you're putting into that bucket for testing? I'll try and
>> reproduce it with my single node cluster.
>>
>> --
>> Luke Bakken
>> Engineer / CSE
>> lbakken at basho.com
>>
>>
>> On Mon, Oct 13, 2014 at 5:02 PM, Lucas Grijander
>> <lucasgrinjander69 at gmail.com> wrote:
>> > Hi Luke.
>> >
>> > I really appreciate your efforts to attempt to reproduce the problem. I
>> > think that the configs are right. I have been doing also a lot of tests
>> and
>> > with 1 server/node, the memory bucket works flawlessly, as your test.
>> The
>> > Riak cluster where we have the problem has a multi_backend with 1 memory
>> > backend, 2 bitcask backends and 2 leveldb backends. I have only changed
>> the
>> > parameter connection of the memory backend in our production code to
>> another
>> > new "cluster" with only 1 node, with the same config of Riak but with
>> only 1
>> > memory backend under the multi configuration and, as I said, all fine,
>> the
>> > problem vanished. I deduce that the problem appears only with more than
>> 1
>> > node and with a lot of requests.
>> >
>> > In my tests with the production cluster with the problem ( 4 nodes),
>> finally
>> > I realized that the TTL is working but, randomly and suddenly, KEYS
>> already
>> > deleted appear, and KEYS with correct TTL disappear :-? (Maybe something
>> > related with the some ETS internal table? ) This is the moment when I
>> can
>> > obtain KEYS already expired.
>> >
>> > In summary:
>> >
>> > - With cluster with 4 nodes (config below): All OK for a while and
>> suddenly
>> > we lost the last 20 seconds approx. of keys and OLD keys appear in the
>> list:
>> > curl -X GET http://localhost:8098/buckets/ttl_stg/keys?keys=true
>> >
>> > buckets.default.last_write_wins = true
>> > bitcask.io_mode = erlang
>> > multi_backend.ttl_stg.storage_backend = memory
>> > multi_backend.ttl_stg.memory_backend.ttl = 90s
>> > multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 25MB
>> > anti_entropy = passive
>> > ring_size = 256
>> >
>> > - With 1 node: All OK
>> >
>> > buckets.default.n_val = 1
>> > buckets.default.last_write_wins = true
>> > buckets.default.r = 1
>> > buckets.default.w = 1
>> > multi_backend. ttl_stg.storage_backend = memory
>> > multi_backend. ttl_stg.memory_backend.ttl = 90s
>> > multi_backend. ttl_stg.memory_backend.max_memory_per_vnode = 250MB
>> > ring_size = 16
>> >
>> >
>> >
>> > Another note: With this 1 node (32GB RAM) and only activated the memory
>> > backend I have realized than the memory consumption grows without
>> control:
>> >
>> >
>> > # riak-admin  status|grep memory
>> > memory_total : 17323130960
>> > memory_processes : 235043016
>> > memory_processes_used : 233078456
>> > memory_system : 17088087944
>> > memory_atom : 561761
>> > memory_atom_used : 561127
>> > memory_binary : 6737787976
>> > memory_code : 14370908
>> > memory_ets : 10295224544
>> >
>> > # # riak-admin diag -d debug
>> > [debug] Local RPC: os:getpid([]) [5000]
>> > [debug] Running shell command: ps -o pmem,rss -p 17521
>> > [debug] Shell command output:
>> > %MEM   RSS
>> > 60.5 19863800
>> >
>> > Wow 18.9GB when the max_memory_per_vnode = 250MB. Is far away from the
>> > value,  250*16vnodes = 4000MB. Is it that correct?
>> >
>> > This is the riak-admin vnode-status of 1 vnode, the other 15 are with
>> > similar data:
>> >
>> > VNode: 1370157784997721485815954530671515330927436759040
>> > Backend: riak_kv_multi_backend
>> > Status:
>> > [{<<"ttl_stg">>,
>> >   [{mod,riak_kv_memory_backend},
>> >    {data_table_status,[{compressed,false},
>> >                        {memory,1156673},
>> >                        {owner,<8343.9466.104>},
>> >                        {heir,none},
>> >
>> > {name,riak_kv_1370157784997721485815954530671515330927436759040},
>> >                        {size,29656},
>> >                        {node,'riak at xxxxxxxx'},
>> >                        {named_table,false},
>> >                        {type,ordered_set},
>> >                        {keypos,1},
>> >                        {protection,protected}]},
>> >    {index_table_status,[{compressed,false},
>> >                         {memory,89},
>> >                         {owner,<8343.9466.104>},
>> >                         {heir,none},
>> >
>> > {name,riak_kv_1370157784997721485815954530671515330927436759040_i},
>> >                         {size,0},
>> >                         {node,'riak at xxxxxxxxx'},
>> >                         {named_table,false},
>> >                         {type,ordered_set},
>> >                         {keypos,1},
>> >                         {protection,protected}]},
>> >    {time_table_status,[{compressed,false},
>> >                        {memory,75968936},
>> >                        {owner,<8343.9466.104>},
>> >                        {heir,none},
>> >
>> > {name,riak_kv_1370157784997721485815954530671515330927436759040_t},
>> >                        {size,2813661},
>> >                        {node,'riak at xxxxxxxxx'},
>> >                        {named_table,false},
>> >                        {type,ordered_set},
>> >                        {keypos,1},
>> >                        {protection,protected}]}]}]
>> >
>> > Thanks!
>> >
>> > 2014-10-13 22:30 GMT+02:00 Luke Bakken <lbakken at basho.com>:
>> >>
>> >> Hi Lucas,
>> >>
>> >> I've tried reproducing this using a local Riak 2.0.1 node, however TTL
>> >> is working as expected.
>> >>
>> >> Here is the configuration I have in /etc/riak/riak.conf:
>> >>
>> >> storage_backend = multi
>> >> multi_backend.default = bc_default
>> >>
>> >> multi_backend.ttl_stg.storage_backend = memory
>> >> multi_backend.ttl_stg.memory_backend.ttl = 90s
>> >> multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 4MB
>> >>
>> >> multi_backend.bc_default.storage_backend = bitcask
>> >> multi_backend.bc_default.bitcask.data_root = /var/lib/riak/bc_default
>> >> multi_backend.bc_default.bitcask.io_mode = erlang
>> >>
>> >> This translates to the following in
>> >> /var/lib/riak/generated.configs/app.2014.10.13.13.13.29.config:
>> >>
>> >> {multi_backend_default,<<"bc_default">>},
>> >> {multi_backend,
>> >>     [{<<"ttl_stg">>,riak_kv_memory_backend,[{ttl,90},{max_memory,4}]},
>> >>     {<<"bc_default">>,riak_kv_bitcask_backend,
>> >>     [{io_mode,erlang},
>> >>         {expiry_grace_time,0},
>> >>         {small_file_threshold,10485760},
>> >>         {dead_bytes_threshold,134217728},
>> >>         {frag_threshold,40},
>> >>         {dead_bytes_merge_trigger,536870912},
>> >>         {frag_merge_trigger,60},
>> >>         {max_file_size,2147483648},
>> >>         {open_timeout,4},
>> >>         {data_root,"/var/lib/riak/bc_default"},
>> >>         {sync_strategy,none},
>> >>         {merge_window,always},
>> >>         {max_fold_age,-1},
>> >>         {max_fold_puts,0},
>> >>         {expiry_secs,-1},
>> >>         {require_hint_crc,true}]}]}]},
>> >>
>> >> I set the bucket properties to use the ttl_stg backend:
>> >>
>> >> root at UBUNTU-12-1:~# cat ttl_stg-props.json
>> >> {"props":{"name":"ttl_stg","backend":"ttl_stg"}}
>> >>
>> >> root at UBUNTU-12-1:~# curl -XPUT -H'Content-type: application/json'
>> >> localhost:8098/buckets/ttl_stg/props --data-ascii @ttl_stg-props.json
>> >>
>> >> root at UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/props
>> >>
>> >>
>> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}}
>> >>
>> >>
>> >> And used the following statement to PUT test data:
>> >>
>> >> curl -XPUT localhost:8098/buckets/ttl_stg/keys/1 -d "TEST $(date)"
>> >>
>> >> After 90 seconds, this is the response I get from Riak:
>> >>
>> >> root at UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/keys/1
>> >> not found
>> >>
>> >> I would carefully check all of the app.config / riak.conf files in
>> >> your cluster, the output of "riak config effective" and the bucket
>> >> properties for those buckets you expect to be using the memory backend
>> >> with TTL. I also recommend using the localhost:8098/buckets/ endpoint
>> >> instead of the deprecated riak/ endpoint.
>> >>
>> >> Please let me know if you have additional questions.
>> >> --
>> >> Luke Bakken
>> >> Engineer / CSE
>> >> lbakken at basho.com
>> >>
>> >>
>> >> On Fri, Oct 3, 2014 at 11:32 AM, Lucas Grijander
>> >> <lucasgrinjander69 at gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > I have a memory backend in production with Riak 2.0.1, 4 servers and
>> 256
>> >> > vnodes. The servers have the same date and time.
>> >> >
>> >> > I have seen an odd performance with the ttl.
>> >> >
>> >> > This is the config:
>> >> >
>> >> >            {<<"ttl_stg">>,riak_kv_memory_backend,
>> >> >             [{ttl,90},{max_memory,25}]},
>> >> >
>> >> > For example, see this GET response in one of the riak servers:
>> >> >
>> >> > < HTTP/1.1 200 OK
>> >> > < X-Riak-Vclock: a85hYGBgzGDKBVIc4otdfgR/7bfIYEpkzGNlKI1efJYvCwA=
>> >> > < Vary: Accept-Encoding
>> >> > * Server MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
>> is
>> >> > not
>> >> > blacklisted
>> >> > < Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
>> >> > < Link: </riak/ttl_stg>; rel="up"
>> >> > < Last-Modified: Fri, 03 Oct 2014 17:40:05 GMT
>> >> > < ETag: "3c8bGoifWcOCSVn0otD5nI"
>> >> > < Date: Fri, 03 Oct 2014 17:47:50 GMT
>> >> > < Content-Type: application/json
>> >> > < Content-Length: 17
>> >> >
>> >> > If the TTL is 90 seconds, Why the GET doesn't return "not found" if
>> the
>> >> > difference between "Last-Modified" and "Date" (of the curl request)
>> is
>> >> > greater than the TTL?
>> >> >
>> >> > Thanks in advance!
>> >> >
>> >> >
>> >> > _______________________________________________
>> >> > riak-users mailing list
>> >> > riak-users at lists.basho.com
>> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >> >
>> >
>> >
>> >
>> > _______________________________________________
>> > riak-users mailing list
>> > riak-users at lists.basho.com
>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20141020/c8ec5012/attachment.html>


More information about the riak-users mailing list