[Basho Riak] Fail To Update Document Repeatly With Cluster of 5 Nodes

John Daily jdaily at basho.com
Fri Feb 10 03:18:47 EST 2017


The questions about your IP addresses are good ones: you’re likely to run into more trouble when a Riak cluster is spread across multiple networks, and from a security standpoint I would recommend against exposing Riak KV to an untrusted network, even if its security features are enabled.

Would you please try setting pw=3 and pr=3 on your writes and reads? I would not be surprised to find that the objects which consistently succeed will report success, while the objects that consistently have trouble will report an error.

Decreasing those values to 2 would also be interesting: if a CRDT update succeeds with pw=2 and a fetch succeeds with pr=2 and the data is still stale, that would be of particular concern.

And it should not matter for this use case, but please check to see whether all your server clocks are synchronized.

-John

> On Feb 9, 2017, at 3:21 PM, DeadZen <deadzen at deadzen.com> wrote:
> 
> Why are they public?
> 
> On Thu, Feb 9, 2017 at 3:11 PM, Alexander Sicular <siculars at gmail.com> wrote:
>> Speaking of timings:
>> 
>> ring_members : ['riak-node1 at 64.137.190.244','riak-node2 at 64.137.247.82',
>> 'riak-node3 at 64.137.162.64','riak-node4 at 64.137.161.229',
>> 'riak-node5 at 64.137.217.73']
>> 
>> Are these nodes in the same local area network?
>> 
>> On Thu, Feb 9, 2017 at 12:49 PM, my hue <tranmyhue.grackle at gmail.com> wrote:
>>> Dear Russel,
>>> 
>>> I did the simplest possible with new document and use modify_type to update
>>> a single register.
>>> I still meet fail update at some times.
>>> 
>>> My steps did as follow :
>>> 
>>> Step 1:   Initial a new document Map
>>> Step 2:  Create new map with :  riakc_pb_socket:update_type(Pid,
>>> {BucketType, Bucket}, Key,  riakc_map:to_op(Map), []).
>>> Step 3:   Fetch to check result :
>>> riakc_pb_socket:fetch_type(Pid,{BucketType,Bucket}, Key).
>>> Step 4:  Create Fun for input of modify_type which update only one field of
>>> map
>>> 
>>> Fun1 = fun(OldMap) -> riakc_map:update({<<"status_id">>, register}, fun(R)
>>> -> riakc_register:set(<<"show">>,  R) end, OldMap) end.
>>> 
>>> Fun2 = fun(OldMap) -> riakc_map:update({<<"status_id">>, register}, fun(R)
>>> -> riakc_register:set(<<"hide">>,  R) end, OldMap) end.
>>> 
>>> Step 5: Update :
>>> 
>>> riakc_pb_socket:modify_type(Pid, Fun1, {BucketType, Bucket}, Key, []).
>>> 
>>> Fetch to check :
>>> 
>>> riakc_pb_socket:fetch_type(Pid,{BucketType,Bucket}, Key).
>>> 
>>> Step 6:  Update:
>>> 
>>> riakc_pb_socket:modify_type(Pid, Fun2, {BucketType, Bucket}, Key, []).
>>> 
>>> Fetch to check :
>>> 
>>> riakc_pb_socket:fetch_type(Pid,{BucketType,Bucket}, Key).
>>> 
>>> 
>>> For my debug and test,  I repeated step 5 and step 6 on one document about
>>> 20 times.
>>> And via many documents, I meet weird behaviour that some documents meet fail
>>> update, and some documents never fail update.
>>> The first time, I think that cause network, or timeout between nodes and
>>> this is only random of fail.  So I deleted documents with command:
>>> 
>>> riakc_pb_socket:delete(Pid, {BucketType,Bucket}, Key, []).
>>> 
>>> Then retest on each document of first test again. And It is very amazing
>>> that the documents meet fail at first test still meet fail at this second
>>> test, and the documents passed at first test still pass at this second test.
>>> Delete all again, and retest and of course get the same result.
>>> 
>>> After all I make other test case, I used one fail document at all test
>>> times, and keep all fields except change key to get different documents for
>>> the debug.  And very surprise that I still got some fail and some success,
>>> although documents are  the same field and value except key.  Delete and
>>> retest and still the same result. Documents succeeded will be always
>>> succeed. And document meet fail will be always failed.  I totally do not
>>> understand root cause till now. And hope that can get support and help from
>>> the developers of riak.   I can tell that my system mostly fail with cluster
>>> run when faced this issue.
>>> 
>>> The following is some map documents I used on the test.  And I also attached
>>> the extracted log of each node at one of the fail times together with this
>>> email. I do not really get meaning of riak log but hope that can help
>>> developers of riak get something.
>>> 
>>> 
>>> 
>>> * New Document which meet fail with my steps.
>>> 
>>> {map,[],
>>>     [{{<<"account_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"created_by_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"created_time_dt">>,register},
>>> {register,<<>>,<<"2017-02-7T23:49:04Z">>}},
>>>      {{<<"currency">>,register}, {register,<<>>,<<"usd">>}},
>>> 
>>> {{<<"id">>,register},{register,<<>>,<<"menu1234567812345678123456789">>}},
>>>      {{<<"maintain_mode_b">>,register}, {register,<<>>,<<"false">>}},
>>>      {{<<"menu_category_revision_id">>,register},
>>> {register,<<>>,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},
>>>      {{<<"name">>,register},{register,<<>>,<<"menutest">>}},
>>>      {{<<"order_id">>,register},{register,<<>>,<<"0">>}},
>>>      {{<<"rest_location_p">>,register},
>>> {register,<<>>,<<"10.844117421366443,106.63982392275398">>}},
>>>      {{<<"restaurant_id">>,register},
>>> {register,<<>>,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},
>>>      {{<<"restaurant_status_id">>,register}, {register,<<>>,<<"active">>}},
>>>      {{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}},
>>>      {{<<"status_id">>,register},{register,<<>>,<<"show">>}},
>>>      {{<<"updated_by_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"updated_time_dt">>,register},
>>> {register,<<>>,<<"2017-02-7T23:49:04Z">>}}],
>>>     [],undefined}.
>>> 
>>> Key = <<"menu1234567812345678123456789">>
>>> 
>>> * New Document which always success with my steps:
>>> 
>>> {map,[],
>>>     [{{<<"account_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"created_by_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"created_time_dt">>,register},
>>> {register,<<>>,<<"2017-02-7T23:49:04Z">>}},
>>>      {{<<"currency">>,register},{register,<<>>,<<"usd">>}},
>>> 
>>> {{<<"id">>,register},{register,<<>>,<<"menub497380c19be4fd3a3b51c85d4e9f246">>}},
>>>      {{<<"maintain_mode_b">>,register}, {register,<<>>,<<"false">>}},
>>>      {{<<"menu_category_revision_id">>,register},
>>> {register,<<>>,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},
>>>      {{<<"name">>,register},{register,<<>>,<<"menutest">>}},
>>>      {{<<"order_id">>,register},{register,<<>>,<<"0">>}},
>>>      {{<<"rest_location_p">>,register},
>>> {register,<<>>,<<"10.844117421366443,106.63982392275398">>}},
>>>      {{<<"restaurant_id">>,register},
>>> {register,<<>>,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},
>>>      {{<<"restaurant_status_id">>,register}, {register,<<>>,<<"active">>}},
>>>      {{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}},
>>>      {{<<"status_id">>,register},{register,<<>>,<<"show">>}},
>>>      {{<<"updated_by_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"updated_time_dt">>,register},
>>> {register,<<>>,<<"2017-02-7T23:49:04Z">>}}],
>>>     [], undefined}.
>>> 
>>> Key = <<"menub497380c19be4fd3a3b51c85d4e9f246">>
>>> 
>>> * New Document which fail with my steps
>>> 
>>> {map,[],
>>>     [{{<<"account_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"created_by_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"created_time_dt">>,register},
>>> {register,<<>>,<<"2017-02-7T23:49:04Z">>}},
>>>      {{<<"currency">>,register},{register,<<>>,<<"usd">>}},
>>> 
>>> {{<<"id">>,register},{register,<<>>,<<"menufe89488afa948875cab6b0b18d579f22">>}},
>>>      {{<<"maintain_mode_b">>,register},  {register,<<>>,<<"false">>}},
>>>      {{<<"menu_category_revision_id">>,register},
>>> {register,<<>>,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},
>>>      {{<<"name">>,register},{register,<<>>,<<"menutest">>}},
>>>      {{<<"order_id">>,register},{register,<<>>,<<"0">>}},
>>>      {{<<"rest_location_p">>,register},
>>> {register,<<>>,<<"10.844117421366443,106.63982392275398">>}},
>>>      {{<<"restaurant_id">>,register},
>>> {register,<<>>,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},
>>>      {{<<"restaurant_status_id">>,register}, {register,<<>>,<<"active">>}},
>>>      {{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}},
>>>      {{<<"status_id">>,register},{register,<<>>,<<"show">>}},
>>>      {{<<"updated_by_id">>,register},
>>> {register,<<>>,<<"accountqweraccountqweraccountqwer">>}},
>>>      {{<<"updated_time_dt">>,register},
>>> {register,<<>>,<<"2017-02-7T23:49:04Z">>}}],
>>>     [],undefined}.
>>> 
>>> Key = <<"menufe89488afa948875cab6b0b18d579f22">>.
>>> 
>>> Note : All documents mostly the same except key, and  tested with the same
>>> bucket type and bucket.   Bucket Type and Bucket have properties with which
>>> I reported on first email. So for remind, under is a description  of bucket
>>> type, bucket and cluster :
>>> 
>>> * Bucket Type :
>>> 
>>> - Bucket type created with the following command:
>>> 
>>> riak-admin bucket-type create bucket_type_name
>>> '{"props":{"backend":"bitcask_mult","datatype":"map"}}'
>>> 
>>> riak-admin bucket-type activate bucket_type_name
>>> 
>>> 
>>> * Bucket Property:
>>> 
>>> {"props":{"name":"bucket_name","active":true,"allow_mult":true,"backend":"bitcask_mult","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"map","dvv_enabled":true,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"bucket_name","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","search_index":"menu_idx","small_vclock":50,"w":"quorum","young_vclock":20}}
>>> 
>>> Note :
>>> + "datatype":"map"
>>> + "last_write_wins": false
>>> + "dvv_enabled": true
>>> + "allow_mult": true
>>> 
>>> 
>>> * Cluster Infor :
>>> 
>>> - Member status :
>>> 
>>>>> riak-admin member-status
>>> 
>>> ================================= Membership
>>> ==================================
>>> Status     Ring    Pending    Node
>>> -------------------------------------------------------------------------------
>>> valid      18.8%      --      'riak-node1 at 64.137.190.244'
>>> valid      18.8%      --      'riak-node2 at 64.137.247.82'
>>> valid      18.8%      --      'riak-node3 at 64.137.162.64'
>>> valid      25.0%      --      'riak-node4 at 64.137.161.229'
>>> valid      18.8%      --      'riak-node5 at 64.137.217.73'
>>> -------------------------------------------------------------------------------
>>> Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>>> 
>>> -----------------------------------------------------------------------------------------------------------------------------
>>> 
>>> - Ring
>>> 
>>>>> riak-admin status | grep ring
>>> 
>>> ring_creation_size : 64
>>> ring_members : ['riak-node1 at 64.137.190.244','riak-node2 at 64.137.247.82',
>>> 'riak-node3 at 64.137.162.64','riak-node4 at 64.137.161.229',
>>> 'riak-node5 at 64.137.217.73']
>>> ring_num_partitions : 64
>>> ring_ownership : <<"[{'riak-node2 at 64.137.247.82',12},\n
>>> {'riak-node5 at 64.137.217.73',12},\n {'riak-node1 at 64.137.190.244',12},\n
>>> {'riak-node3 at 64.137.162.64',12},\n {'riak-node4 at 64.137.161.229',16}]">>
>>> rings_reconciled : 0
>>> rings_reconciled_total : 31
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Tue, Feb 7, 2017 at 5:37 PM, Russell Brown <russell.brown at mac.com> wrote:
>>>> 
>>>> 
>>>> On 7 Feb 2017, at 10:27, my hue <tranmyhue.grackle at gmail.com> wrote:
>>>> 
>>>>> Dear Russell,
>>>>> 
>>>>> Yes, I updated all registers in one go.
>>>>> And I do not try yet with updating a single register at a time.
>>>>> let me try to see.  But I wonder that any affect on solving conflict at
>>>>> riak cluster
>>>>> if update all in one go?
>>>>> 
>>>> 
>>>> Just trying to make the search space as small as possible. I don’t think
>>>> _any_ of this should fail. The maps code is very well tested and well used,
>>>> so it’s all kind of odd.
>>>> 
>>>> Without hands on it’s hard to debug, and email back and forth is slow, so
>>>> if you try the simplest possible thing and that still fails, it helps.
>>>> 
>>>> IMO the simplest possible thing is to start with a new, empty key and use
>>>> modify_type to update a single register.
>>>> 
>>>> Many thanks
>>>> 
>>>> Russell
>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Feb 7, 2017 at 5:18 PM, Russell Brown <russell.brown at mac.com>
>>>>> wrote:
>>>>> So in you’re updating all those registers in one go? Out of interest,
>>>>> what happens if you update a single register at a time?
>>>>> 
>>>>> On 7 Feb 2017, at 10:02, my hue <tranmyhue.grackle at gmail.com> wrote:
>>>>> 
>>>>>> Dear Russel,
>>>>>> 
>>>>>>> Can you run riakc_map:to_op(Map). and show me the output of that,
>>>>>>> please?
>>>>>> 
>>>>>> The following is output of riakc_map:to_op(Map) :
>>>>>> 
>>>>>> {map, {update, [{update,
>>>>>> {<<"updated_time_dt">>,register},{assign,<<"2017-02-06T17:22:39Z">>}},
>>>>>> {update,{<<"updated_by_id">>,register},
>>>>>> {assign,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{update,{<<"status_id">>,register},{assign,<<"show">>}},{update,{<<"start_time">>,register},{assign,<<"dont_use">>}},{update,{<<"restaurant_status_id">>,register},
>>>>>> {assign,<<"inactive">>}}, {update,{<<"restaurant_id">>,register},
>>>>>> {assign,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},{update,{<<"rest_location_p">>,register},
>>>>>> {assign,<<"10.844117421366443,106.63982392275398">>}},
>>>>>> {update,{<<"order_i">>,register},{assign,<<"0">>}},
>>>>>> {update,{<<"name">>,register},{assign,<<"fullmenu">>}},
>>>>>> {update,{<<"menu_category_revision_id">>,register},
>>>>>> {assign,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},
>>>>>> {update,{<<"maintain_mode_b">>,register},{assign,<<"false">>}},
>>>>>> {update,{<<"id">>,register},
>>>>>> {assign,<<"menufe89488afa948875cab6b0b18d579f21">>}},
>>>>>> {update,{<<"end_time">>,register},{assign,<<"dont_use">>}},{update,{<<"currency">>,register},{assign,<<"cad">>}},
>>>>>> {update,{<<"created_time_dt">>,register},
>>>>>> {assign,<<"2017-01-27T03:34:04Z">>}},
>>>>>> {update,{<<"created_by_id">>,register},
>>>>>> {assign,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},
>>>>>> {update,{<<"account_id">>,register},
>>>>>> {assign,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}}]},
>>>>>> <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,103,5,182,208,0,0,118,2,97,39,104,2,109,0,0,0,12,137,252,139,186,176,202,25,96,0,0,195,164,97,53,106>>}
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Feb 7, 2017 at 4:36 PM, Russell Brown <russell.brown at mac.com>
>>>>>> wrote:
>>>>>> 
>>>>>> On 7 Feb 2017, at 09:34, my hue <tranmyhue.grackle at gmail.com> wrote:
>>>>>> 
>>>>>>> Dear Russell,
>>>>>>> 
>>>>>>>> What operation are you performing? What is the update you perform?
>>>>>>>> Do you set a register value, add a register, remove a register?
>>>>>>> 
>>>>>>> I used riakc_map:update to update value with map. I do the following
>>>>>>> steps :
>>>>>>> 
>>>>>>> - Get FetchData map with  fetch_type
>>>>>>> - Extract key, value, context from FetchData
>>>>>>> - Obtain UpdateData with:
>>>>>>> 
>>>>>>> + Init map with context
>>>>>> 
>>>>>> I don’t understand this step
>>>>>> 
>>>>>>> + Use :
>>>>>>> 
>>>>>>>   riakc_map:update({K, register}, fun(R) -> riakc_register:set(V,
>>>>>>> R) end,  InitMap)
>>>>>>> 
>>>>>>> to obtain UpdateData
>>>>>>> 
>>>>>>> Note:
>>>>>>> K : key
>>>>>>> V:  value
>>>>>>> 
>>>>>>> - Then  update UpdateData with update_type
>>>>>>> 
>>>>>> 
>>>>>> Can you run riakc_map:to_op(Map). and show me the output of that,
>>>>>> please?
>>>>>> 
>>>>>>> The following is sample about Update data :
>>>>>>> 
>>>>>>> {map, [] ,
>>>>>>> 
>>>>>>> [{{<<"account_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_time_dt">>,register},{register,<<>>,<<"2017-01-27T03:34:04Z">>}},{{<<"currency">>,register},{register,<<>>,<<"cad">>}},{{<<"end_time">>,register},{register,<<>>,<<"dont_use">>}},{{<<"id">>,register},{register,<<>>,<<"menufe89488afa948875cab6b0b18d579f21">>}},{{<<"maintain_mode_b">>,register},{register,<<>>,<<"false">>}},{{<<"menu_category_revision_id">>,register},{register,<<>>,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},{{<<"name">>,register},{register,<<>>,<<"fullmenu">>}},{{<<"order_i">>,register},{register,<<>>,<<"0">>}},{{<<"rest_location_p">>,register},{register,<<>>,<<"10.844117421366443,106.63982392275398">>}},{{<<"restaurant_id">>,register},{register,<<>>,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},{{<<"restaurant_status_id">>,register},{register,<<>>,<<"inactive">>}},{{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}},{{<<"status_id">>,register},{register,<<>>,<<"show">>}},{{<<"updated_by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"updated_time_dt">>,register},{register,<<>>,<<"2017-02-06T17:22:39Z">>}}],
>>>>>>> [] ,
>>>>>>> <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,103,5,182,208,0,0,118,2,97,39,104,2,109,0,0,0,12,137,252,139,186,176,202,25,96,0,0,195,164,97,53,106>>
>>>>>>> }
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Feb 7, 2017 at 3:43 PM, Russell Brown
>>>>>>> <russell.brown at mac.com> wrote:
>>>>>>> 
>>>>>>> On 7 Feb 2017, at 08:17, my hue <tranmyhue.grackle at gmail.com> wrote:
>>>>>>> 
>>>>>>>> Dear John and Russell Brown,
>>>>>>>> 
>>>>>>>> * How fast is your turnaround time between an update and a fetch?
>>>>>>>> 
>>>>>>>> The turnaround time between an update and a fetch about 1 second.
>>>>>>>> During my team and I  debug, we adjusted haproxy with the scenario
>>>>>>>> as follow:
>>>>>>>> 
>>>>>>>> Scenario 1 : round robin via 5 nodes of cluster
>>>>>>>> 
>>>>>>>> We meet issue at scenario 1 and we are afraid of that timeout can
>>>>>>>> be occurs between nodes,
>>>>>>>> make us still get stale data. Then we performed scenario 2
>>>>>>>> 
>>>>>>>> Scenario 2:  Disable round robin and only route request to node 1.
>>>>>>>> Cluster still is 5 nodes.
>>>>>>>> With this case we ensure that request update and fetch always come
>>>>>>>> to and from node 1.
>>>>>>>> And the issue still occurs.
>>>>>>>> 
>>>>>>>> At the fail time, I hoped that can get any error log from riak
>>>>>>>> nodes to give me any information.
>>>>>>>> But riak log show to me nothing and everything is ok.
>>>>>>>> 
>>>>>>>> * What operation are you performing?
>>>>>>>> 
>>>>>>>> I used :
>>>>>>>> 
>>>>>>>> riakc_pb_socket:update_type(Pid, {Bucket-Type, Bucket}, Key,
>>>>>>>> riakc_map:to_op(Map), []).
>>>>>>>> riakc_pb_socket:fetch_type(Pid, {BucketType, Bucket}, Key, []).
>>>>>>> 
>>>>>>> What operation are you performing? What is the update you perform?
>>>>>>> Do you set a register value, add a register, remove a register?
>>>>>>>> 
>>>>>>>> * It looks like the map is a single level map of last-write-wins
>>>>>>>> registers. Is there a chance that the time on the node handling the update
>>>>>>>> is behind the value in the lww-register?
>>>>>>>> 
>>>>>>>> => I am not sure about logic show conflict of internal riak node.
>>>>>>>> And the issue  never happens if I used single node.
>>>>>>>> My bucket properties as follow :
>>>>>>>> 
>>>>>>>> 
>>>>>>>> {"props":{"name":"menu","active":true,"allow_mult":true,"backend":"bitcask_mult","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"map","dvv_enabled":true,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"menu","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","search_index":"menu_idx","small_vclock":50,"w":"quorum","young_vclock":20}}
>>>>>>>> 
>>>>>>>> Note :
>>>>>>>> + "datatype":"map"
>>>>>>>> + "last_write_wins": false
>>>>>>>> + "dvv_enabled": true
>>>>>>>> + "allow_mult": true
>>>>>>>> 
>>>>>>>> 
>>>>>>>> * Have you tried using the `modify_type` operation in
>>>>>>>> riakc_pb_socket which does the fetch/update operation in sequence for you?
>>>>>>>> 
>>>>>>>> => I dot not use yet, but my action is sequence with fetch and
>>>>>>>> then update.  Might be I will try modify_type to see.
>>>>>>>> 
>>>>>>>> * Anything in the error logs on any of the nodes?
>>>>>>>> 
>>>>>>>> => From the node log,  no errror report at fail time.
>>>>>>>> 
>>>>>>>> * Is the opaque context identical from the fetch and then later
>>>>>>>> after the update?
>>>>>>>> 
>>>>>>>> => There is the context  got from fetch and that context used with
>>>>>>>> update.
>>>>>>>> And during our debug time with string of sequence : fetch ,
>>>>>>>> update, fetch , update , ....  the context I saw always the same at
>>>>>>>> fetch data.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Hue Tran
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Feb 7, 2017 at 2:11 AM, John Daily <jdaily at basho.com>
>>>>>>>> wrote:
>>>>>>>> Originally I suspected the context which allows Riak to resolve
>>>>>>>> conflicts was not present in your data, but I see it in your map structure.
>>>>>>>> Thanks for supplying such a detailed description.
>>>>>>>> 
>>>>>>>> How fast is your turnaround time between an update and a fetch?
>>>>>>>> Even if the cluster is healthy it’s not impossible to see a timeout between
>>>>>>>> nodes, which could result in a stale retrieval. Have you verified whether
>>>>>>>> the stale data persists?
>>>>>>>> 
>>>>>>>> A single node cluster gives an advantage that you’ll never see in
>>>>>>>> a real cluster: a perfectly synchronized clock. It also reduces (but does
>>>>>>>> not completely eliminate) the possibility of an internal timeout between
>>>>>>>> processes.
>>>>>>>> 
>>>>>>>> -John
>>>>>>>> 
>>>>>>>>> On Feb 6, 2017, at 1:02 PM, my hue <tranmyhue.grackle at gmail.com>
>>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Dear Riak Team,
>>>>>>>>> 
>>>>>>>>> I and my team used riak as database for my production with an
>>>>>>>>> cluster including 5 nodes.
>>>>>>>>> While production run, we meet an critical bug that is sometimes
>>>>>>>>> fail to update document.
>>>>>>>>> I and my colleagues performed debug and detected an issue with
>>>>>>>>> the scenario as follow:
>>>>>>>>> 
>>>>>>>>> +  fetch document
>>>>>>>>> +  change value of document
>>>>>>>>> +  update document
>>>>>>>>> 
>>>>>>>>> Repeat about 10 times and will meet fail. With the document is
>>>>>>>>> updated continually,
>>>>>>>>> sometimes will face update fail.
>>>>>>>>> 
>>>>>>>>> The first time,  5 nodes of cluster we used riak version 2.1.1.
>>>>>>>>> After meet above bug, we upgraded to use riak version 2.2.0 and
>>>>>>>>> this issue still occurs.
>>>>>>>>> 
>>>>>>>>> Via many time test,  debug using  Tcpdump at riak node :
>>>>>>>>> 
>>>>>>>>> tcpdump -A -ttt  -i {interface} src host {host} and dst port
>>>>>>>>> {port}
>>>>>>>>> 
>>>>>>>>> And together with the command:
>>>>>>>>> 
>>>>>>>>> riak-admin status | grep "node_puts_map\| node_puts_map_total\|
>>>>>>>>> node_puts_total\| vnode_map_update_total\| vnode_puts_total\"
>>>>>>>>> 
>>>>>>>>> we  got that the riak server already get the update request.
>>>>>>>>> However, do not know why riak backend fail to update document.
>>>>>>>>> At the fail time,  from riak server log everything is ok.
>>>>>>>>> 
>>>>>>>>> Then we removed cluster and use a single riak server,  and see
>>>>>>>>> that above bug never happen.
>>>>>>>>> 
>>>>>>>>> For that reason, think that is only happen with cluster work. We
>>>>>>>>> took research on basho riak document and our riak configure
>>>>>>>>> seems that like suggestion from document.  We totally blocked on
>>>>>>>>> this issue and hope that can get support from you
>>>>>>>>> so that can obtain a stable work from riak database for our
>>>>>>>>> production.
>>>>>>>>> Thank you so much.  Hope that can get your reply soon.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> * The following is our riak node information :
>>>>>>>>> 
>>>>>>>>> Riak version:  2.2.0
>>>>>>>>> OS :  CentOS Linux release 7.2.1511
>>>>>>>>> Cpu :  4 core
>>>>>>>>> Memory : 4G
>>>>>>>>> Riak configure : the attached file "riak.conf"
>>>>>>>>> 
>>>>>>>>> Note :
>>>>>>>>> 
>>>>>>>>> - We mostly using default configure of riak configure except that
>>>>>>>>> we used storage backend is multi
>>>>>>>>> 
>>>>>>>>> storage_backend = multi
>>>>>>>>> multi_backend.bitcask_mult.storage_backend = bitcask
>>>>>>>>> multi_backend.bitcask_mult.bitcask.data_root =
>>>>>>>>> /var/lib/riak/bitcask_mult
>>>>>>>>> multi_backend.default = bitcask_mult
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> - Bucket type created with the following command:
>>>>>>>>> 
>>>>>>>>> riak-admin bucket-type create dev_restor
>>>>>>>>> '{"props":{"backend":"bitcask_mult","datatype":"map"}}'
>>>>>>>>> riak-admin bucket-type activate dev_restor
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> - Bucket Type Status :
>>>>>>>>> 
>>>>>>>>>>> riak-admin bucket-type status dev_restor
>>>>>>>>> 
>>>>>>>>> dev_restor is active
>>>>>>>>> young_vclock: 20
>>>>>>>>> w: quorum
>>>>>>>>> small_vclock: 50
>>>>>>>>> rw: quorum
>>>>>>>>> r: quorum
>>>>>>>>> pw: 0
>>>>>>>>> precommit: []
>>>>>>>>> pr: 0
>>>>>>>>> postcommit: []
>>>>>>>>> old_vclock: 86400
>>>>>>>>> notfound_ok: true
>>>>>>>>> n_val: 3
>>>>>>>>> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
>>>>>>>>> last_write_wins: false
>>>>>>>>> dw: quorum
>>>>>>>>> dvv_enabled: true
>>>>>>>>> chash_keyfun: {riak_core_util,chash_std_keyfun}
>>>>>>>>> big_vclock: 50
>>>>>>>>> basic_quorum: false
>>>>>>>>> backend: <<"bitcask_mult">>
>>>>>>>>> allow_mult: true
>>>>>>>>> datatype: map
>>>>>>>>> active: true
>>>>>>>>> claimant: 'riak-node1 at 64.137.190.244'
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> - Bucket Property :
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> {"props":{"name":"menu","active":true,"allow_mult":true,"backend":"bitcask_mult","basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"map","dvv_enabled":true,"dw":"quorum","last_write_wins":false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"menu","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"quorum","rw":"quorum","search_index":"menu_idx","small_vclock":50,"w":"quorum","young_vclock":20}}
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> - Member status :
>>>>>>>>> 
>>>>>>>>>>> riak-admin member-status
>>>>>>>>> 
>>>>>>>>> ================================= Membership
>>>>>>>>> ==================================
>>>>>>>>> Status     Ring    Pending    Node
>>>>>>>>> 
>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>> valid      18.8%      --      'riak-node1 at 64.137.190.244'
>>>>>>>>> valid      18.8%      --      'riak-node2 at 64.137.247.82'
>>>>>>>>> valid      18.8%      --      'riak-node3 at 64.137.162.64'
>>>>>>>>> valid      25.0%      --      'riak-node4 at 64.137.161.229'
>>>>>>>>> valid      18.8%      --      'riak-node5 at 64.137.217.73'
>>>>>>>>> 
>>>>>>>>> -------------------------------------------------------------------------------
>>>>>>>>> Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> - Ring
>>>>>>>>> 
>>>>>>>>>>> riak-admin status | grep ring
>>>>>>>>> 
>>>>>>>>> ring_creation_size : 64
>>>>>>>>> ring_members :
>>>>>>>>> ['riak-node1 at 64.137.190.244','riak-node2 at 64.137.247.82',
>>>>>>>>> 'riak-node3 at 64.137.162.64','riak-node4 at 64.137.161.229',
>>>>>>>>> 'riak-node5 at 64.137.217.73']
>>>>>>>>> ring_num_partitions : 64
>>>>>>>>> ring_ownership : <<"[{'riak-node2 at 64.137.247.82',12},\n
>>>>>>>>> {'riak-node5 at 64.137.217.73',12},\n {'riak-node1 at 64.137.190.244',12},\n
>>>>>>>>> {'riak-node3 at 64.137.162.64',12},\n {'riak-node4 at 64.137.161.229',16}]">>
>>>>>>>>> rings_reconciled : 0
>>>>>>>>> rings_reconciled_total : 31
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> * The riak client :
>>>>>>>>> 
>>>>>>>>> + riak-erlang-client:
>>>>>>>>> https://github.com/basho/riak-erlang-client
>>>>>>>>> + release :   2.4.2
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> * Riak client API used:
>>>>>>>>> 
>>>>>>>>> + Insert/Update:
>>>>>>>>> 
>>>>>>>>> riakc_pb_socket:update_type(Pid, {Bucket-Type, Bucket}, Key,
>>>>>>>>> riakc_map:to_op(Map), []).
>>>>>>>>> 
>>>>>>>>> + Fetch :
>>>>>>>>> 
>>>>>>>>> riakc_pb_socket:fetch_type(Pid, {BucketType, Bucket}, Key, []).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> * Step to perform an  update :
>>>>>>>>> 
>>>>>>>>> - Fetch document
>>>>>>>>> - Update document
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -----------------------------------------------------------------------------------------------------------------------------
>>>>>>>>> 
>>>>>>>>> *  Data got from fetch_type:
>>>>>>>>> 
>>>>>>>>> {map,  [{{<<"account_id">>,register},
>>>>>>>>> <<"accounta25a424b8484181e8ba1bec25bf7c491">>},
>>>>>>>>> {{<<"created_by_id">>,register},
>>>>>>>>> <<"accounta25a424b8484181e8ba1bec25bf7c491">>},
>>>>>>>>> {{<<"created_time_dt">>,register},<<"2017-01-27T03:34:04Z">>},
>>>>>>>>> {{<<"currency">>,register},<<"cad">>},
>>>>>>>>> {{<<"end_time">>,register},<<"dont_use">>},
>>>>>>>>> {{<<"id">>,register},<<"menufe89488afa948875cab6b0b18d579f21">>},
>>>>>>>>> {{<<"maintain_mode_b">>,register},<<"false">>},
>>>>>>>>> {{<<"menu_category_revision_id">>,register},
>>>>>>>>> <<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>},
>>>>>>>>> {{<<"name">>,register},<<"fullmenu">>}, {{<<"order_i">>,register},<<"0">>},
>>>>>>>>> {{<<"rest_location_p">>,register},
>>>>>>>>> <<"10.844117421366443,106.63982392275398">>},
>>>>>>>>> {{<<"restaurant_id">>,register},
>>>>>>>>> <<"rest848e042b3a0488640981c8a6dc4a8281">>},
>>>>>>>>> {{<<"restaurant_status_id">>,register},<<"inactive">>},
>>>>>>>>> {{<<"start_time">>,register},<<"dont_use">>},
>>>>>>>>> {{<<"status_id">>,register},<<"hide">>}, {{<<"updated_by_id">>,register},
>>>>>>>>> <<"accounta25a424b8484181e8ba1bec25bf7c491">>},
>>>>>>>>> {{<<"updated_time_dt">>,register},<<"2017-02-06T17:22:39Z">>}],
>>>>>>>>> [],
>>>>>>>>> [],
>>>>>>>>> <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,103,5,182,208,0,0,118,2,97,40,104,2,109,0,0,0,12,137,252,139,186,176,202,25,96,0,0,195,164,97,54,106>>}
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> *  Update with update_type
>>>>>>>>> 
>>>>>>>>> Below is Map data before using riakc_map:to_op(Map) :
>>>>>>>>> 
>>>>>>>>> {map, [] ,
>>>>>>>>> 
>>>>>>>>> [{{<<"account_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_time_dt">>,register},{register,<<>>,<<"2017-01-27T03:34:04Z">>}},{{<<"currency">>,register},{register,<<>>,<<"cad">>}},{{<<"end_time">>,register},{register,<<>>,<<"dont_use">>}},{{<<"id">>,register},{register,<<>>,<<"menufe89488afa948875cab6b0b18d579f21">>}},{{<<"maintain_mode_b">>,register},{register,<<>>,<<"false">>}},{{<<"menu_category_revision_id">>,register},{register,<<>>,<<"0-634736bc14e0bd3ed7e3fe0f1ee64443">>}},{{<<"name">>,register},{register,<<>>,<<"fullmenu">>}},{{<<"order_i">>,register},{register,<<>>,<<"0">>}},{{<<"rest_location_p">>,register},{register,<<>>,<<"10.844117421366443,106.63982392275398">>}},{{<<"restaurant_id">>,register},{register,<<>>,<<"rest848e042b3a0488640981c8a6dc4a8281">>}},{{<<"restaurant_status_id">>,register},{register,<<>>,<<"inactive">>}},{{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}},{{<<"status_id">>,register},{register,<<>>,<<"show">>}},{{<<"updated_by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"updated_time_dt">>,register},{register,<<>>,<<"2017-02-06T17:22:39Z">>}}],
>>>>>>>>> [] ,
>>>>>>>>> <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,103,5,182,208,0,0,118,2,97,39,104,2,109,0,0,0,12,137,252,139,186,176,202,25,96,0,0,195,164,97,53,106>>
>>>>>>>>> }
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> Hue Tran
>>>>>>>>> <riak.conf>_______________________________________________
>>>>>>>>> riak-users mailing list
>>>>>>>>> riak-users at lists.basho.com
>>>>>>>>> 
>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com





More information about the riak-users mailing list