Memory-backend TTL

Luke Bakken lbakken at basho.com
Mon Oct 20 09:43:24 EDT 2014


Lucas,

Thanks for all the detailed information. This is not expected
behavior. What MIME type are you using for storing the long integer
data (64 binary bits, I assume)?

I'd like to try and reproduce this. There have been issues with TTL
and max_memory but they should have been fixed for Riak 2.0.
--
Luke Bakken
Engineer / CSE
lbakken at basho.com


On Mon, Oct 20, 2014 at 1:56 AM, Lucas Grijander
<lucasgrinjander69 at gmail.com> wrote:
> Hi Luke,
>
> Indeed, when removed the thousands of requests, the memory is stabilized.
> However the memory consumption is still very high:
>
> riak-admin status |grep memory
> memory_total : 18494760128
> memory_processes : 145363184
> memory_processes_used : 142886424
> memory_system : 18349396944
> memory_atom : 561761
> memory_atom_used : 554496
> memory_binary : 7108243240
> memory_code : 13917820
> memory_ets : 11200328880
>
> I have test also with Riak 1.4.10 and the performance is the same.
>
> Is it normal that the "memory_ets" has more than 10GB when we have a
> "ring_size" of 16 and a max_memory_per_vnode = 250MB?
>
> 2014-10-15 20:50 GMT+02:00 Lucas Grijander <lucasgrinjander69 at gmail.com>:
>>
>> Hi Luke.
>>
>> About the first issue:
>>
>> - From the beginning, the servers are all running ntpd. They are Ubuntu
>> 14.04 and the ntpd service is installed and running by default.
>> - Anti-entropy was also disabled from the beginning:
>>
>> {anti_entropy,{off,[]}},
>>
>>
>> About the second issue, I am perplex because, after 2 restarts of the Riak
>> server, just now there is a big memory consumption but is not growing like
>> the previous days. The only change was to remove this code (it was used
>> thousands of times/s). It was a possible workaround about the previous
>> problem with the TTL but this code now is useless because the TTL is working
>> fine with this node alone:
>>
>> self.db.delete((key)
>> self.db.get(key, r=1)
>>
>>
>> # riak-admin status|grep memory
>> memory_total : 18617871264
>> memory_processes : 224480232
>> memory_processes_used : 222700176
>> memory_system : 18393391032
>> memory_atom : 561761
>> memory_atom_used : 552862
>> memory_binary : 7135206080
>> memory_code : 13779729
>> memory_ets : 11209256232
>>
>> The problem is that I don't remember if the code change was after or
>> before the second restart. I am going to restart the riak server again and I
>> will report you about if the "possible memory leak" is reproduced.
>>
>> This is the props of the bucket:
>>
>> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":1,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":1,"rw":"quorum","small_vclock":50,"w":1,"young_vclock":20}}
>>
>> About the data that we put into the bucket are all with this schema:
>>
>> KEY: Alphanumeric with a length of 47
>> DATA: Long integer.
>>
>> # riak-admin status|grep puts
>> vnode_puts : 84708
>> vnode_puts_total : 123127430
>> node_puts : 83169
>> node_puts_total : 123128062
>>
>> # riak-admin status|grep gets
>> vnode_gets : 162314
>> vnode_gets_total : 240433213
>> node_gets : 162317
>> node_gets_total : 240433216
>>
>> 2014-10-14 16:26 GMT+02:00 Luke Bakken <lbakken at basho.com>:
>>>
>>> Hi Lucas,
>>>
>>> With regard to the mysterious key deletion / resurrection, please do
>>> the following:
>>>
>>> * Ensure your servers are all running ntpd and have their time
>>> synchronized as closely as possible.
>>> * Disable anti-entropy. I suspect this is causing the strange behavior
>>> you're seeing with keys.
>>>
>>> Your single node cluster memory consumption issue is a bit of a
>>> puzzler. I'm assuming you're using default bucket settings and not
>>> using bucket types based on your previous emails, and that allow_mult
>>> is still false for your ttl_stg bucket. Can you tell me more about the
>>> data you're putting into that bucket for testing? I'll try and
>>> reproduce it with my single node cluster.
>>>
>>> --
>>> Luke Bakken
>>> Engineer / CSE
>>> lbakken at basho.com
>>>
>>>
>>> On Mon, Oct 13, 2014 at 5:02 PM, Lucas Grijander
>>> <lucasgrinjander69 at gmail.com> wrote:
>>> > Hi Luke.
>>> >
>>> > I really appreciate your efforts to attempt to reproduce the problem. I
>>> > think that the configs are right. I have been doing also a lot of tests
>>> > and
>>> > with 1 server/node, the memory bucket works flawlessly, as your test.
>>> > The
>>> > Riak cluster where we have the problem has a multi_backend with 1
>>> > memory
>>> > backend, 2 bitcask backends and 2 leveldb backends. I have only changed
>>> > the
>>> > parameter connection of the memory backend in our production code to
>>> > another
>>> > new "cluster" with only 1 node, with the same config of Riak but with
>>> > only 1
>>> > memory backend under the multi configuration and, as I said, all fine,
>>> > the
>>> > problem vanished. I deduce that the problem appears only with more than
>>> > 1
>>> > node and with a lot of requests.
>>> >
>>> > In my tests with the production cluster with the problem ( 4 nodes),
>>> > finally
>>> > I realized that the TTL is working but, randomly and suddenly, KEYS
>>> > already
>>> > deleted appear, and KEYS with correct TTL disappear :-? (Maybe
>>> > something
>>> > related with the some ETS internal table? ) This is the moment when I
>>> > can
>>> > obtain KEYS already expired.
>>> >
>>> > In summary:
>>> >
>>> > - With cluster with 4 nodes (config below): All OK for a while and
>>> > suddenly
>>> > we lost the last 20 seconds approx. of keys and OLD keys appear in the
>>> > list:
>>> > curl -X GET http://localhost:8098/buckets/ttl_stg/keys?keys=true
>>> >
>>> > buckets.default.last_write_wins = true
>>> > bitcask.io_mode = erlang
>>> > multi_backend.ttl_stg.storage_backend = memory
>>> > multi_backend.ttl_stg.memory_backend.ttl = 90s
>>> > multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 25MB
>>> > anti_entropy = passive
>>> > ring_size = 256
>>> >
>>> > - With 1 node: All OK
>>> >
>>> > buckets.default.n_val = 1
>>> > buckets.default.last_write_wins = true
>>> > buckets.default.r = 1
>>> > buckets.default.w = 1
>>> > multi_backend. ttl_stg.storage_backend = memory
>>> > multi_backend. ttl_stg.memory_backend.ttl = 90s
>>> > multi_backend. ttl_stg.memory_backend.max_memory_per_vnode = 250MB
>>> > ring_size = 16
>>> >
>>> >
>>> >
>>> > Another note: With this 1 node (32GB RAM) and only activated the memory
>>> > backend I have realized than the memory consumption grows without
>>> > control:
>>> >
>>> >
>>> > # riak-admin  status|grep memory
>>> > memory_total : 17323130960
>>> > memory_processes : 235043016
>>> > memory_processes_used : 233078456
>>> > memory_system : 17088087944
>>> > memory_atom : 561761
>>> > memory_atom_used : 561127
>>> > memory_binary : 6737787976
>>> > memory_code : 14370908
>>> > memory_ets : 10295224544
>>> >
>>> > # # riak-admin diag -d debug
>>> > [debug] Local RPC: os:getpid([]) [5000]
>>> > [debug] Running shell command: ps -o pmem,rss -p 17521
>>> > [debug] Shell command output:
>>> > %MEM   RSS
>>> > 60.5 19863800
>>> >
>>> > Wow 18.9GB when the max_memory_per_vnode = 250MB. Is far away from the
>>> > value,  250*16vnodes = 4000MB. Is it that correct?
>>> >
>>> > This is the riak-admin vnode-status of 1 vnode, the other 15 are with
>>> > similar data:
>>> >
>>> > VNode: 1370157784997721485815954530671515330927436759040
>>> > Backend: riak_kv_multi_backend
>>> > Status:
>>> > [{<<"ttl_stg">>,
>>> >   [{mod,riak_kv_memory_backend},
>>> >    {data_table_status,[{compressed,false},
>>> >                        {memory,1156673},
>>> >                        {owner,<8343.9466.104>},
>>> >                        {heir,none},
>>> >
>>> > {name,riak_kv_1370157784997721485815954530671515330927436759040},
>>> >                        {size,29656},
>>> >                        {node,'riak at xxxxxxxx'},
>>> >                        {named_table,false},
>>> >                        {type,ordered_set},
>>> >                        {keypos,1},
>>> >                        {protection,protected}]},
>>> >    {index_table_status,[{compressed,false},
>>> >                         {memory,89},
>>> >                         {owner,<8343.9466.104>},
>>> >                         {heir,none},
>>> >
>>> > {name,riak_kv_1370157784997721485815954530671515330927436759040_i},
>>> >                         {size,0},
>>> >                         {node,'riak at xxxxxxxxx'},
>>> >                         {named_table,false},
>>> >                         {type,ordered_set},
>>> >                         {keypos,1},
>>> >                         {protection,protected}]},
>>> >    {time_table_status,[{compressed,false},
>>> >                        {memory,75968936},
>>> >                        {owner,<8343.9466.104>},
>>> >                        {heir,none},
>>> >
>>> > {name,riak_kv_1370157784997721485815954530671515330927436759040_t},
>>> >                        {size,2813661},
>>> >                        {node,'riak at xxxxxxxxx'},
>>> >                        {named_table,false},
>>> >                        {type,ordered_set},
>>> >                        {keypos,1},
>>> >                        {protection,protected}]}]}]
>>> >
>>> > Thanks!
>>> >
>>> > 2014-10-13 22:30 GMT+02:00 Luke Bakken <lbakken at basho.com>:
>>> >>
>>> >> Hi Lucas,
>>> >>
>>> >> I've tried reproducing this using a local Riak 2.0.1 node, however TTL
>>> >> is working as expected.
>>> >>
>>> >> Here is the configuration I have in /etc/riak/riak.conf:
>>> >>
>>> >> storage_backend = multi
>>> >> multi_backend.default = bc_default
>>> >>
>>> >> multi_backend.ttl_stg.storage_backend = memory
>>> >> multi_backend.ttl_stg.memory_backend.ttl = 90s
>>> >> multi_backend.ttl_stg.memory_backend.max_memory_per_vnode = 4MB
>>> >>
>>> >> multi_backend.bc_default.storage_backend = bitcask
>>> >> multi_backend.bc_default.bitcask.data_root = /var/lib/riak/bc_default
>>> >> multi_backend.bc_default.bitcask.io_mode = erlang
>>> >>
>>> >> This translates to the following in
>>> >> /var/lib/riak/generated.configs/app.2014.10.13.13.13.29.config:
>>> >>
>>> >> {multi_backend_default,<<"bc_default">>},
>>> >> {multi_backend,
>>> >>     [{<<"ttl_stg">>,riak_kv_memory_backend,[{ttl,90},{max_memory,4}]},
>>> >>     {<<"bc_default">>,riak_kv_bitcask_backend,
>>> >>     [{io_mode,erlang},
>>> >>         {expiry_grace_time,0},
>>> >>         {small_file_threshold,10485760},
>>> >>         {dead_bytes_threshold,134217728},
>>> >>         {frag_threshold,40},
>>> >>         {dead_bytes_merge_trigger,536870912},
>>> >>         {frag_merge_trigger,60},
>>> >>         {max_file_size,2147483648},
>>> >>         {open_timeout,4},
>>> >>         {data_root,"/var/lib/riak/bc_default"},
>>> >>         {sync_strategy,none},
>>> >>         {merge_window,always},
>>> >>         {max_fold_age,-1},
>>> >>         {max_fold_puts,0},
>>> >>         {expiry_secs,-1},
>>> >>         {require_hint_crc,true}]}]}]},
>>> >>
>>> >> I set the bucket properties to use the ttl_stg backend:
>>> >>
>>> >> root at UBUNTU-12-1:~# cat ttl_stg-props.json
>>> >> {"props":{"name":"ttl_stg","backend":"ttl_stg"}}
>>> >>
>>> >> root at UBUNTU-12-1:~# curl -XPUT -H'Content-type: application/json'
>>> >> localhost:8098/buckets/ttl_stg/props --data-ascii @ttl_stg-props.json
>>> >>
>>> >> root at UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/props
>>> >>
>>> >>
>>> >> {"props":{"allow_mult":false,"backend":"ttl_stg","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dvv_enabled":false,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"ttl_stg","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","small_vclock":50,"w":"quorum","young_vclock":20}}
>>> >>
>>> >>
>>> >> And used the following statement to PUT test data:
>>> >>
>>> >> curl -XPUT localhost:8098/buckets/ttl_stg/keys/1 -d "TEST $(date)"
>>> >>
>>> >> After 90 seconds, this is the response I get from Riak:
>>> >>
>>> >> root at UBUNTU-12-1:~# curl -XGET localhost:8098/buckets/ttl_stg/keys/1
>>> >> not found
>>> >>
>>> >> I would carefully check all of the app.config / riak.conf files in
>>> >> your cluster, the output of "riak config effective" and the bucket
>>> >> properties for those buckets you expect to be using the memory backend
>>> >> with TTL. I also recommend using the localhost:8098/buckets/ endpoint
>>> >> instead of the deprecated riak/ endpoint.
>>> >>
>>> >> Please let me know if you have additional questions.
>>> >> --
>>> >> Luke Bakken
>>> >> Engineer / CSE
>>> >> lbakken at basho.com
>>> >>
>>> >>
>>> >> On Fri, Oct 3, 2014 at 11:32 AM, Lucas Grijander
>>> >> <lucasgrinjander69 at gmail.com> wrote:
>>> >> > Hello,
>>> >> >
>>> >> > I have a memory backend in production with Riak 2.0.1, 4 servers and
>>> >> > 256
>>> >> > vnodes. The servers have the same date and time.
>>> >> >
>>> >> > I have seen an odd performance with the ttl.
>>> >> >
>>> >> > This is the config:
>>> >> >
>>> >> >            {<<"ttl_stg">>,riak_kv_memory_backend,
>>> >> >             [{ttl,90},{max_memory,25}]},
>>> >> >
>>> >> > For example, see this GET response in one of the riak servers:
>>> >> >
>>> >> > < HTTP/1.1 200 OK
>>> >> > < X-Riak-Vclock: a85hYGBgzGDKBVIc4otdfgR/7bfIYEpkzGNlKI1efJYvCwA=
>>> >> > < Vary: Accept-Encoding
>>> >> > * Server MochiWeb/1.1 WebMachine/1.10.5 (jokes are better explained)
>>> >> > is
>>> >> > not
>>> >> > blacklisted
>>> >> > < Server: MochiWeb/1.1 WebMachine/1.10.5 (jokes are better
>>> >> > explained)
>>> >> > < Link: </riak/ttl_stg>; rel="up"
>>> >> > < Last-Modified: Fri, 03 Oct 2014 17:40:05 GMT
>>> >> > < ETag: "3c8bGoifWcOCSVn0otD5nI"
>>> >> > < Date: Fri, 03 Oct 2014 17:47:50 GMT
>>> >> > < Content-Type: application/json
>>> >> > < Content-Length: 17
>>> >> >
>>> >> > If the TTL is 90 seconds, Why the GET doesn't return "not found" if
>>> >> > the
>>> >> > difference between "Last-Modified" and "Date" (of the curl request)
>>> >> > is
>>> >> > greater than the TTL?
>>> >> >
>>> >> > Thanks in advance!
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > riak-users mailing list
>>> >> > riak-users at lists.basho.com
>>> >> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > riak-users mailing list
>>> > riak-users at lists.basho.com
>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> >
>>
>>
>




More information about the riak-users mailing list