Riak CS/Stanchion troubleshooting (Retrieval of user record)
kaz at basho.com
Sun Nov 15 21:01:17 EST 2015
ha_proxy's timeout settings often causes disconnected errors on a Riak
CS deployment by high work load. termination_stat  in tcplog 
lets you know if timeout happens or not.
> 2015-11-13 13:13:09.514 [error] <0.11264.1387>@riak_cs_wm_common:maybe_create_user:222 Retrieval of user record for s3 failed. Reason: disconnected
This means Riak CS failed to read a user data from Riak for
authentication due to a disconnected error.
> Riak CS adds, removes, gets properties through Stanchion service. Am I right? I can't exactly understand where is my bottleneck - Riak, Riak CS or Stanchion.
Mainly Stanchion is only used to update/delete data of users and
buckets. To inspect a node, Riak S2/CS 2.1 introduced new metrics
including various latencies and counters, which help to identify
> When we need authenticated access for reading object from bucket do we need Stanchion? If not I can't understand why I had a lot of error during getting objects from Riak CS.
Authenticated access is always necessary but a read request of user
data for auth is issued from Riak CS to Riak directly, not through
> P. S. Sometimes when there is some issues with Riak CS - Stanchion connectivity I need to restart Riak CS.
Riak CS 1.5.0 has connection pool leak problem . You might hit the issue...
On Sat, Nov 14, 2015 at 2:04 AM, Vladyslav Zakhozhai
<v.zakhozhai at smartweb.com.ua> wrote:
> I have Riak CS cluster with 18 nodes. On each node there is Riak CS and Riak
> service and one Stanchion node.
> Riak 1.4.12
> Riak CS 1.5.0
> Stanchion 1.5.0
> Riak CS and Riak allocated behind HAProxy balancers:
> WAN -> HAProxy -> Riak CS nodes -> HAProxy -> Riak nodes.
> Stanchion -> HAProxy -> Riak
> Today due a spike of traffic load (about 1000 rps) on the cluster 50% of
> Riak CS returned HTTP 500 and 503 (querying /riak-cs/ping resource also was
> not successful).
> In Riak CS logs I've seen the following messages:
> 2015-11-13 13:13:09.514 [error]
> <0.11264.1387>@riak_cs_wm_common:maybe_create_user:222 Retrieval of user
> record for s3 failed. Reason: disconnected
> In Riak CS logs I see the following:
> 2015-11-13 17:31:52.995 [error] <0.11254.6534> Lager event handler
> error_logger_lager_h exited with reason
> I suspect that there were problem between Riak CS - Stanhion or Stanhion -
> Riak. I have no clear idea in Stanchion troubleshooting. The main reason is
> the following. Stanhion works fine, service is up (answers on ping command).
> But it is very laconic: there is almost nothing in console and error logs
> (even with debug log level).
> Riak CS adds, removes, gets properties through Stanchion service. Am I
> right? I can't exactly understand where is my bottleneck - Riak, Riak CS or
> When we need authenticated access for reading object from bucket do we need
> Stanchion? If not I can't understand why I had a lot of error during getting
> objects from Riak CS.
> Thank you in advance.
> P. S. Sometimes when there is some issues with Riak CS - Stanchion
> connectivity I need to restart Riak CS.
> riak-users mailing list
> riak-users at lists.basho.com
Kazuhiro Suzuki | Basho Japan KK
More information about the riak-users