[Basho Riak] Fail To Update Document Repeatly With Cluster of 5 Nodes

my hue tranmyhue.grackle at gmail.com
Tue Feb 7 03:17:37 EST 2017


Dear John and Russell Brown,

* How fast is your turnaround time between an update and a fetch?

The turnaround time between an update and a fetch about 1 second.
During my team and I  debug, we adjusted haproxy with the scenario as
follow:

Scenario 1 : round robin via 5 nodes of cluster

We meet issue at scenario 1 and we are afraid of that timeout can be occurs
between nodes,
make us still get stale data. Then we performed scenario 2

Scenario 2:  Disable round robin and only route request to node 1. Cluster
still is 5 nodes.
With this case we ensure that request update and fetch always come to and
from node 1.
And the issue still occurs.

At the fail time, I hoped that can get any error log from riak nodes to
give me any information.
But riak log show to me nothing and everything is ok.

* What operation are you performing?

I used :

riakc_pb_socket:update_type(Pid, {Bucket-Type, Bucket}, Key,
riakc_map:to_op(Map), []).
riakc_pb_socket:fetch_type(Pid, {BucketType, Bucket}, Key, []).

* It looks like the map is a single level map of last-write-wins registers. Is
there a chance that the time on the node handling the update is behind the
value in the lww-register?

=> I am not sure about logic show conflict of internal riak node. And the
issue  never happens if I used single node.
My bucket properties as follow :

{"props":{"name":"menu","active":true,"allow_mult":true,"bac
kend":"bitcask_mult","basic_quorum":false,"big_vclock":50,
"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_
keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"
map","dvv_enabled":true,"dw":"quorum","last_write_wins":
false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapre
duce_linkfun"},"n_val":3,"name":"menu","notfound_ok":tru
e,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"
pw":0,"r":"quorum","rw":"quorum","search_index":"menu_
idx","small_vclock":50,"w":"quorum","young_vclock":20}}

Note :
+ "datatype":"map"
+ "last_write_wins": false
+ "dvv_enabled": true
+ "allow_mult": true


* Have you tried using the `modify_type` operation in riakc_pb_socket which
does the fetch/update operation in sequence for you?

=> I dot not use yet, but my action is sequence with fetch and then
update.  Might be I will try modify_type to see.

* Anything in the error logs on any of the nodes?

=> From the node log,  no errror report at fail time.

* Is the opaque context identical from the fetch and then later after the
update?

=> There is the context  got from fetch and that context used with update.
And during our debug time with string of sequence : fetch , update, fetch ,
update , ....  the context I saw always the same at
fetch data.

Best regards,
Hue Tran



On Tue, Feb 7, 2017 at 2:11 AM, John Daily <jdaily at basho.com> wrote:

> Originally I suspected the context which allows Riak to resolve conflicts
> was not present in your data, but I see it in your map structure. Thanks
> for supplying such a detailed description.
>
> How fast is your turnaround time between an update and a fetch? Even if
> the cluster is healthy it’s not impossible to see a timeout between nodes,
> which could result in a stale retrieval. Have you verified whether the
> stale data persists?
>
> A single node cluster gives an advantage that you’ll never see in a real
> cluster: a perfectly synchronized clock. It also reduces (but does not
> completely eliminate) the possibility of an internal timeout between
> processes.
>
> -John
>
> On Feb 6, 2017, at 1:02 PM, my hue <tranmyhue.grackle at gmail.com> wrote:
>
> Dear Riak Team,
>
> I and my team used riak as database for my production with an cluster
> including 5 nodes.
> While production run, we meet an critical bug that is sometimes fail to
> update document.
> I and my colleagues performed debug and detected an issue with the
> scenario as follow:
>
> +  fetch document
> +  change value of document
> +  update document
>
> Repeat about 10 times and will meet fail. With the document is updated
> continually,
> sometimes will face update fail.
>
> The first time,  5 nodes of cluster we used riak version 2.1.1.
> After meet above bug, we upgraded to use riak version 2.2.0 and this issue
> still occurs.
>
> Via many time test,  debug using  Tcpdump at riak node :
>
> *tcpdump -A -ttt  -i {interface} src host {host} and dst port {port} *
>
> And together with the command:
>
> *riak-admin status | grep "node_puts_map\| node_puts_map_total\|
> node_puts_total\| vnode_map_update_total\| vnode_puts_total\"*
>
> we  got that the riak server already get the update request.
> However, do not know why riak backend fail to update document.
> At the fail time,  from riak server log everything is ok.
>
> Then we removed cluster and use a single riak server,  and see that above
> bug never happen.
>
> For that reason, think that is only happen with cluster work. We took
> research on basho riak document and our riak configure
> seems that like suggestion from document.  We totally blocked on this
> issue and hope that can get support from you
> so that can obtain a stable work from riak database for our production.
> Thank you so much.  Hope that can get your reply soon.
>
>
> * The following is our riak node information :
>
> Riak version:  2.2.0
> OS :  CentOS Linux release 7.2.1511
> Cpu :  4 core
> Memory : 4G
> Riak configure : the attached file "riak.conf"
>
> *Note :*
>
> - We mostly using default configure of riak configure except that  we used
> storage backend is multi
>
> storage_backend = multi
> multi_backend.bitcask_mult.storage_backend = bitcask
> multi_backend.bitcask_mult.bitcask.data_root = /var/lib/riak/bitcask_mult
> multi_backend.default = bitcask_mult
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> - Bucket type created with the following command:
>
> riak-admin bucket-type create dev_restor '{"props":{"backend":"bitcask_
> mult","datatype":"map"}}'
> riak-admin bucket-type activate dev_restor
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> - Bucket Type Status :
>
> >> riak-admin bucket-type status dev_restor
>
> dev_restor is active
> young_vclock: 20
> w: quorum
> small_vclock: 50
> rw: quorum
> r: quorum
> pw: 0
> precommit: []
> pr: 0
> postcommit: []
> old_vclock: 86400
> notfound_ok: true
> n_val: 3
> linkfun: {modfun,riak_kv_wm_link_walker,mapreduce_linkfun}
> last_write_wins: false
> dw: quorum
> dvv_enabled: true
> chash_keyfun: {riak_core_util,chash_std_keyfun}
> big_vclock: 50
> basic_quorum: false
> backend: <<"bitcask_mult">>
> allow_mult: true
> datatype: map
> active: true
> claimant: 'riak-node1 at 64.137.190.244'
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> - Bucket Property :
>
> {"props":{"name":"menu","active":true,"allow_mult":true,"bac
> kend":"bitcask_mult","basic_quorum":false,"big_vclock":50,
> "chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_
> keyfun"},"claimant":"riak-node1 at 64.137.190.244","datatype":"
> map","dvv_enabled":true,"dw":"quorum","last_write_wins":
> false,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapre
> duce_linkfun"},"n_val":3,"name":"menu","notfound_ok":tru
> e,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"
> pw":0,"r":"quorum","rw":"quorum","search_index":"menu_
> idx","small_vclock":50,"w":"quorum","young_vclock":20}}
>
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> - Member status :
>
> >> riak-admin member-status
>
> ================================= Membership
> ==================================
> Status     Ring    Pending    Node
> ------------------------------------------------------------
> -------------------
> valid      18.8%      --      'riak-node1 at 64.137.190.244'
> valid      18.8%      --      'riak-node2 at 64.137.247.82'
> valid      18.8%      --      'riak-node3 at 64.137.162.64'
> valid      25.0%      --      'riak-node4 at 64.137.161.229'
> valid      18.8%      --      'riak-node5 at 64.137.217.73'
> ------------------------------------------------------------
> -------------------
> Valid:5 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
>
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> - Ring
>
> >> riak-admin status | grep ring
>
> ring_creation_size : 64
> ring_members : ['riak-node1 at 64.137.190.244','riak-node2 at 64.137.247.82', '
> riak-node3 at 64.137.162.64','riak-node4 at 64.137.161.229', '
> riak-node5 at 64.137.217.73']
> ring_num_partitions : 64
> ring_ownership : <<"[{'riak-node2 at 64.137.247.82',12},\n {'
> riak-node5 at 64.137.217.73',12},\n {'riak-node1 at 64.137.190.244',12},\n {'
> riak-node3 at 64.137.162.64',12},\n {'riak-node4 at 64.137.161.229',16}]">>
> rings_reconciled : 0
> rings_reconciled_total : 31
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> * The riak client :
>
> + riak-erlang-client:  https://github.com/basho/riak-erlang-client
> + release :   2.4.2
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> * Riak client API used:
>
> + Insert/Update:
>
> riakc_pb_socket:update_type(Pid, {Bucket-Type, Bucket}, Key,
> riakc_map:to_op(Map), []).
>
> + Fetch :
>
> riakc_pb_socket:fetch_type(Pid, {BucketType, Bucket}, Key, []).
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> * Step to perform an  update :
>
> - Fetch document
> - Update document
>
> ------------------------------------------------------------
> -----------------------------------------------------------------
>
> *  Data got from fetch_type:
>
> {map,  [{{<<"account_id">>,register}, <<"accounta25a424b8484181e8ba1bec
> 25bf7c491">>},
> {{<<"created_by_id">>,register}, <<"accounta25a424b8484181e8ba1bec25bf7c491">>},
> {{<<"created_time_dt">>,register},<<"2017-01-27T03:34:04Z">>},
> {{<<"currency">>,register},<<"cad">>}, {{<<"end_time">>,register},<<"dont_use">>},
> {{<<"id">>,register},<<"menufe89488afa948875cab6b0b18d579f21">>},
> {{<<"maintain_mode_b">>,register},<<"false">>},
> {{<<"menu_category_revision_id">>,register}, <<"0-
> 634736bc14e0bd3ed7e3fe0f1ee64443">>}, {{<<"name">>,register},<<"fullmenu">>},
> {{<<"order_i">>,register},<<"0">>}, {{<<"rest_location_p">>,register},
> <<"10.844117421366443,106.63982392275398">>}, {{<<"restaurant_id">>,register},
> <<"rest848e042b3a0488640981c8a6dc4a8281">>}, {{<<"restaurant_status_id">>,register},<<"inactive">>},
> {{<<"start_time">>,register},<<"dont_use">>},
> {{<<"status_id">>,register},<<"hide">>}, {{<<"updated_by_id">>,register},
> <<"accounta25a424b8484181e8ba1bec25bf7c491">>}, {{<<"updated_time_dt">>,
> register},<<"2017-02-06T17:22:39Z">>}],
>  [],
>  [], <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,57,
> 233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,226,
> 103,5,182,208,0,0,118,2,97,40,104,2,109,0,0,0,12,137,252,
> 139,186,176,202,25,96,0,0,195,164,97,54,106>>}
>
>
> *  Update with update_type
>
> Below is Map data before using riakc_map:to_op(Map) :
>
> {map, [] ,
>  [{{<<"account_id">>,register},{register,<<>>,<<"
> accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"created_
> by_id">>,register},{register,<<>>,<<"accounta25a424b8484181e8ba1bec
> 25bf7c491">>}},{{<<"created_time_dt">>,register},{
> register,<<>>,<<"2017-01-27T03:34:04Z">>}},{{<<"currency">>,register},{
> register,<<>>,<<"cad">>}},{{<<"end_time">>,register},{
> register,<<>>,<<"dont_use">>}},{{<<"id">>,register},{register,<<>>,<<"
> menufe89488afa948875cab6b0b18d579f21">>}},{{<<"maintain_
> mode_b">>,register},{register,<<>>,<<"false">>}},{{<<"menu_
> category_revision_id">>,register},{register,<<>>,<<"0-
> 634736bc14e0bd3ed7e3fe0f1ee64443">>}},{{<<"name">>,register}
> ,{register,<<>>,<<"fullmenu">>}},{{<<"order_i">>,register},{
> register,<<>>,<<"0">>}},{{<<"rest_location_p">>,register},{
> register,<<>>,<<"10.844117421366443,106.63982392275398">>}},{{<<"
> restaurant_id">>,register},{register,<<>>,<<"
> rest848e042b3a0488640981c8a6dc4a8281">>}},{{<<"restaurant_
> status_id">>,register},{register,<<>>,<<"inactive">>}}
> ,{{<<"start_time">>,register},{register,<<>>,<<"dont_use">>}
> },{{<<"status_id">>,register},{register,<<>>,<<"show">>}},{{
> <<"updated_by_id">>,register},{register,<<>>,<<"
> accounta25a424b8484181e8ba1bec25bf7c491">>}},{{<<"updated_
> time_dt">>,register},{register,<<>>,<<"2017-02-06T17:22:39Z">>}}],
>  [] ,  <<131,108,0,0,0,3,104,2,109,0,0,0,12,39,21,84,209,219,42,
> 57,233,0,0,156,252,97,34,104,2,109,0,0,0,12,132,107,248,
> 226,103,5,182,208,0,0,118,2,97,39,104,2,109,0,0,0,12,137,
> 252,139,186,176,202,25,96,0,0,195,164,97,53,106>>
> }
>
>
>
>
> -
>
> Best regards,
> Hue Tran
> <riak.conf>_______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170207/f5243341/attachment-0002.html>


More information about the riak-users mailing list