[Basho Riak] Fail To Update Document Repeatly With Cluster of 5 Nodes

Russell Brown russell.brown at mac.com
Tue Feb 7 03:43:09 EST 2017


On 7 Feb 2017, at 08:17, my hue <tranmyhue.grackle at gmail.com> wrote:

> Dear John and Russell Brown,
> 
> * How fast is your turnaround time between an update and a fetch?  
> 
> The turnaround time between an update and a fetch about 1 second. 
> During my team and I  debug, we adjusted haproxy with the scenario as follow:
> 
> Scenario 1 : round robin via 5 nodes of cluster 
> 
> We meet issue at scenario 1 and we are afraid of that timeout can be occurs between nodes,
> make us still get stale data. Then we performed scenario 2 
> 
> Scenario 2:  Disable round robin and only route request to node 1. Cluster still is 5 nodes. 
> With this case we ensure that request update and fetch always come to and from node 1.
> And the issue still occurs. 
> 
> At the fail time, I hoped that can get any error log from riak nodes to give me any information.
> But riak log show to me nothing and everything is ok. 
> 
> * What operation are you performing? 
> 
> I used :
> 
> riakc_pb_socket:update_type(Pid, {Bucket-Type, Bucket}, Key, riakc_map:to_op(Map), []).
> riakc_pb_socket:fetch_type(Pid, {BucketType, Bucket}, Key, []). 

What operation are you performing? What is the update you perform? Do you set a register value, add a register, remove a register?
> 
> * It looks like the map is a single level map of last-write-wins registers. Is there a chance that the time on the node handling the update is behind the value in the lww-register? 
> 
> => I am not sure about logic show conflict of internal riak node. And the issue  never happens if I used single node. 
> My bucket properties as follow :
> 
> {"props":{"name":"menu","active":true,"allow_mult":true,"backend":"bitcask_mult","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"map","dvv_enabled":true,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"menu","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","search_index":"menu_idx","small_vclock":50,"w":"quorum","young_vclock":20}}
> 
> Note :   
> + "datatype":"map" 
> + "last_write_wins": false
> + "dvv_enabled": true
> + "allow_mult": true 
> 
> 
> * Have you tried using the `modify_type` operation in riakc_pb_socket which does the fetch/update operation in sequence for you?
> 
> => I dot not use yet, but my action is sequence with fetch and then update.  Might be I will try modify_type to see. 
> 
> * Anything in the error logs on any of the nodes?
> 
> => From the node log,  no errror report at fail time. 
> 
> * Is the opaque context identical from the fetch and then later after the update? 
> 
> => There is the context  got from fetch and that context used with update.  
> And during our debug time with string of sequence : fetch , update, fetch , update , ....  the context I saw always the same at
> fetch data. 
> 
> Best regards,
> Hue Tran
>  
> 
> 
> On Tue, Feb 7, 2017 at 2:11 AM, John Daily <jdaily at basho.com> wrote:
> Originally I suspected the context which allows Riak to resolve conflicts was not present in your data, but I see it in your map structure. Thanks for supplying such a detailed description.
> 
> How fast is your turnaround time between an update and a fetch? Even if the cluster is healthy it’s not impossible to see a timeout between nodes, which could result in a stale retrieval. Have you verified whether the stale data persists?
> 
> A single node cluster gives an advantage that you’ll never see in a real cluster: a perfectly synchronized clock. It also reduces (but does not completely eliminate) the possibility of an internal timeout between processes.
> 
> -John
> 
>> On Feb 6, 2017, at 1:02 PM, my hue <tranmyhue.grackle at gmail.com> wrote:
>> 
>> Dear Riak Team,
>> 
>> I and my team used riak as database for my production with an cluster including 5 nodes. 
>> While production run, we meet an critical bug that is sometimes fail to update document. 
>> I and my colleagues performed debug and detected an issue with the scenario as follow: 
>> 
>> +  fetch document  
>> +  change value of document 
>> +  update document
>> 
>> Repeat about 10 times and will meet fail. With the document is updated continually, 
>> sometimes will face update fail.
>> 
>> The first time,  5 nodes of cluster we used riak version 2.1.1.  
>> After meet above bug, we upgraded to use riak version 2.2.0 and this issue still occurs.
>> 
>> Via many time test,  debug using  Tcpdump at riak node :
>> 
>> tcpdump -A -ttt  -i {interface} src host {host} and dst port {port} 
>> 
>> And together with the command: 
>> 
>> riak-admin status | grep "node_puts_map\| node_puts_map_total\| node_puts_total\| vnode_map_update_total\| vnode_puts_total\"
>> 
>> we  got that the riak server already get the update request. 
>> However, do not know why riak backend fail to update document.  
>> At the fail time,  from riak server log everything is ok. 
>> 
>> Then we removed cluster and use a single riak server,  and see that above bug never happen.
>>  
>> For that reason, think that is only happen with cluster work. We took research on basho riak document and our riak configure 
>> seems that like suggestion from document.  We totally blocked on this issue and hope that can get support from you  
>> so that can obtain a stable work from riak database for our production. 
>> Thank you so much.  Hope that can get your reply soon.
>> 
>> 
>> * The following is our riak node information : 
>> 
>> Riak version:  2.2.0
>> OS :  CentOS Linux release 7.2.1511
>> Cpu :  4 core
>> Memory : 4G  
>> Riak configure : the attached file "riak.conf"
>> 
>> Note : 
>> 
>> - We mostly using default configure of riak configure except that  we used storage backend is multi  
>> 
>> storage_backend = multi
>> multi_backend.bitcask_mult.storage_backend = bitcask
>> multi_backend.bitcask_mult.bitcask.data_root = /var/lib/riak/bitcask_mult
>> multi_backend.default = bitcask_mult
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> - Bucket type created with the following command:
>> 
>> riak-admin bucket-type create dev_restor '{"props":{"backend":"bitcask_mult","datatype":"map"}}'
>> riak-admin bucket-type activate dev_restor
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> - Bucket Type Status :
>> 
>> >> riak-admin bucket-type status dev_restor
>> 
>> dev_restor is active
>> young_vclock: 20
>> w: quorum
>> small_vclock: 50
>> rw: quorum
>> r: quorum
>> pw: 0
>> precommit: []
>> pr: 0
>> postcommit: []
>> old_vclock: 86400
>> notfound_ok: true
>> n_val: 3
>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>> last_write_wins: false
>> dw: quorum
>> dvv_enabled: true
>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>> big_vclock: 50
>> basic_quorum: false
>> backend: <<"bitcask_mult">>
>> allow_mult: true
>> datatype: map
>> active: true
>> claimant: 'riak-node1 at 64.137.190.244'
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> - Bucket Property :
>> 
>> {"props":{"name":"menu","active":true,"allow_mult":true,"backend":"bitcask_mult","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"map","dvv_enabled":true,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"menu","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","search_index":"menu_idx","small_vclock":50,"w":"quorum","young_vclock":20}}
>> 
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> - Member status :
>> 
>> >> riak-admin member-status
>> 
>> ================================= Membership ==================================
>> Status     Ring    Pending    Node
>> -------------------------------------------------------------------------------
>> valid      18.8%      --      'riak-node1 at 64.137.190.244'
>> valid      18.8%      --      'riak-node2 at 64.137.247.82'
>> valid      18.8%      --      'riak-node3 at 64.137.162.64'
>> valid      25.0%      --      'riak-node4 at 64.137.161.229'
>> valid      18.8%      --      'riak-node5 at 64.137.217.73'
>> -------------------------------------------------------------------------------
>> Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>> 
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> - Ring 
>> 
>> >> riak-admin status | grep ring
>> 
>> ring_creation_size : 64
>> ring_members : ['riak-node1 at 64.137.190.244','riak-node2 at 64.137.247.82', 'riak-node3 at 64.137.162.64','riak-node4 at 64.137.161.229', 'riak-node5 at 64.137.217.73']
>> ring_num_partitions : 64
>> ring_ownership : <<"[{'riak-node2 at 64.137.247.82',12},\n {'riak-node5 at 64.137.217.73',12},\n {'riak-node1 at 64.137.190.244',12},\n {'riak-node3 at 64.137.162.64',12},\n {'riak-node4 at 64.137.161.229',16}]">>
>> rings_reconciled : 0
>> rings_reconciled_total : 31
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> * The riak client :
>> 
>> + riak-erlang-client:  https://github.com/basho/riak-erlang-client 
>> + release :   2.4.2 
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> * Riak client API used:  
>> 
>> + Insert/Update: 
>> 
>> riakc_pb_socket:update_type(Pid, {Bucket-Type, Bucket}, Key, riakc_map:to_op(Map), []).
>> 
>> + Fetch :
>> 
>> riakc_pb_socket:fetch_type(Pid, {BucketType, Bucket}, Key, []). 
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> * Step to perform an  update :
>> 
>> - Fetch document 
>> - Update document 
>> 
>> -----------------------------------------------------------------------------------------------------------------------------
>> 
>> *  Data got from fetch_type: 
>> 
>> {map,  [{{<<"account_id">>,register}, <<"accounta25a424b8484181e8ba1bec25bf7c491">>},
>> {{<<"created_by_id">>,register}, <<"accounta25a424b8484181e8ba1bec25bf7c491">>}, {{<<"created_time_dt">>,register},<<"2017-01-27T03:34:04Z">>}, {{<<"currency">>,register},<<"cad">>}, {{<<"end_time">>,register},<<"dont_use">>}, {{<<"id">>,register},<<"menufe89488afa948875cab6b0b18d579f21">>}, {{<<"maintain_mode_b">>,register},<<"false">>}, {{<<"menu_category_revision_id">>,register}, <<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}, {{<<"name">>,register},<<"fullmenu">>}, {{<<"order_i">>,register},<<"0">>}, {{<<"rest_location_p">>,register}, <<"10.844117421366443,106.63982392275398">>}, {{<<"restaurant_id">>,register}, <<"rest848e042b3a0488640981c8a6dc4a8281">>}, {{<<"restaurant_status_id">>,register},<<"inactive">>}, {{<<"start_time">>,register},<<"dont_use">>}, {{<<"status_id">>,register},<<"hide">>}, {{<<"updated_by_id">>,register}, <<"accounta25a424b8484181e8ba1bec25bf7c491">>}, {{<<"updated_time_dt">>,register},<<"2017-02-06T17:22:39Z">>}],
>>  [],
>>  [], <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,103,5,182,208,0,0,118,2,97,40,104,2,109,0,0,0,12,137,252,139,186,176,202,25,96,0,0,195,164,97,54,106>>}
>> 
>> 
>> *  Update with update_type
>> 
>> Below is Map data before using riakc_map:to_op(Map) : 
>> 
>> {map, [] ,  
>>  [{{<<"account_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_time_dt">>,register},{register,<<>>,<<"2017-01-27T03:34:04Z">>}},{{<<"currency">>,register},{register,<<>>,<<"cad">>}},{{<<"end_time">>,register},{register,<<>>,<<"dont_use">>}},{{<<"id">>,register},{register,<<>>,<<"menufe89488afa948875cab6b0b18d579f21">>}},{{<<"maintain_mode_b">>,register},{register,<<>>,<<"false">>}},{{<<"menu_category_revision_id">>,register},{register,<<>>,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},{{<<"name">>,register},{register,<<>>,<<"fullmenu">>}},{{<<"order_i">>,register},{register,<<>>,<<"0">>}},{{<<"rest_location_p">>,register},{register,<<>>,<<"10.844117421366443,106.63982392275398">>}},{{<<"restaurant_id">>,register},{register,<<>>,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},{{<<"restaurant_status_id">>,register},{register,<<>>,<<"inactive">>}},{{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}},{{<<"status_id">>,register},{register,<<>>,<<"show">>}},{{<<"updated_by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"updated_time_dt">>,register},{register,<<>>,<<"2017-02-06T17:22:39Z">>}}], 
>>  [] ,  <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,103,5,182,208,0,0,118,2,97,39,104,2,109,0,0,0,12,137,252,139,186,176,202,25,96,0,0,195,164,97,53,106>>
>> }
>> 
>> 
>> 
>> 
>> - 
>> 
>> Best regards,
>> Hue Tran
>> <riak.conf>_______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 





More information about the riak-users mailing list