Riak connection pool silent disconnection

Anthony Molinaro anthonym at alumni.caltech.edu
Fri Oct 12 19:43:39 EDT 2012


One note about the riak-erlang-client.

Timeouts will cause the client to disconnect.

So if you set a low timeout, it will get disconnected.

The behavior on disconnects can be configured with a couple of options
passed through at create time.

{ auto_reconnect, true } will reconnect if it gets disconnected, but this
can lead to some { error, disconnected } returns depending on the pooler
used.

{ queue_if_disconnected, true } will queue up request if a disconnect happens.

So you might try setting these both and seeing if the disconnect errors turn
into timeouts.  Or try increasing the timeout, you may just have it set too
low.

Now, why a timeout causes a disconnect, I don't know, anyone from basho who
works on the riak-erlang-client know?

-Anthony

On Fri, Oct 12, 2012 at 07:54:05AM +0400, Mikhail Kuznetsov wrote:
> Yes, I return connection to pool after query. Pool have 30 workers and most of operations is single put. Ulimit in system is equal to 2048. We just started, we store about 200-300 Mb of data on all cluster. We have about terabyte of free space on server disks. Servers mostly stand by and waiting for clients. Heavy load will be later.
> 
> I don't understand what is that parameter {checkout,false,5000} for? There are no checkout parameter in erlang pb client, in server config I don't found any too. 
> -- 
> Sincerely yours,
> Mikhail Kuznetsov
> 
> When best practices meet everyday life and lead to perfection...
> 
> Oct 11, 2012, ? 21:26 , Mike Oxford ???????(?):
> 
> > I think the key may lie here.... ",{checkout,false,5000}"
> > 
> > Are you releasing your connections back to the pool?   Is your throughput greater than the system can handle due to limited connection pool sizes?  What is your ulimit set to (ulimit -n)  ... maybe you're running out of FD's?
> > 
> > -mox
> > 
> > On Wed, Oct 10, 2012 at 10:31 PM, Mikhail Kuznetsov <kuznetsov.m.yu at gmail.com> wrote:
> > I got nasty problem in production. We make connection pool with official erlang pb client. Everything works fine. To organize pool we use hottub (we try several,but that is simplest). Each connection used at least once in 3-5 minutes(production is not full loaded now).
> > 
> > After several days riak server disconnect us. But socket process doesn't die, on any request it answers {error, disconnected}. So far I wrote pool workers checker, if it is_connected(Pid) return not true, we kill worker and pool create new one. I fired it every ten minutes. But it didn't help. It return true, but then I am making request I get {error, disconnected}. Only solution that work so far is pool full reinit if some worker return {error, disconnected}. It is very barbaric and may crash whole app.
> > 
> > When I checked server logs I found many errors like this two: 2012-09-20 00:10:10.976 [error] <0.803.0>@riak_core_vnode:handle_info:510 296867520082839655260123481645494988367611297792 riak_kv_vnode worker pool crashed {timeout,{gen_server,call,[<0.819.0>,{work,<0.806.0>,{fold,#Fun,#Fun},{raw,59205031,<0.28969.11>}}]}} 2012-09-20 00:10:10.976 [error] <0.862.0>@riak_core_vnode:handle_info:510 365375409332725729550921208179070754913983135744 riak_kv_vnode worker pool crashed {timeout,{gen_fsm,sync_send_event,[<0.866.0>,{checkout,false,5000},5000]}}
> > 
> > I guess that is real problem, but I think client connection should at least log something, get connection problem failures list or die. I got is_connected(Pid) = true
> > 
> > How are you organizing connection pools which work 24/7? How you check pool workers or refresh them?
> > 
> > 
> > 
> > 
> > _______________________________________________
> > riak-users mailing list
> > riak-users at lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> > 
> > 
> 

> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym at alumni.caltech.edu>




More information about the riak-users mailing list