Connection Pool with Erlang PB Client Necessary?
bob at redivi.com
Tue Jul 26 14:55:16 EDT 2011
In our case, we have a client id per connection but the connections
are re-used. For a given application request, a connection is checked
out, some work is done, and then it is checked back in to the pool at
the end of the request. Choosing a random client id for every request
would make bigger vector clocks, but it's a reasonable design.
Interleaving operations with the same client id is going to cause
integrity problems (e.g. the single gen_server + connection approach),
you really want to treat a client id like a transaction id.
On Tue, Jul 26, 2011 at 11:35 AM, Andrew Berman <rexxe98 at gmail.com> wrote:
> Thanks for the reply Bryan. This all makes sense. I am fairly new to
> Erlang and wasn't sure if using a gen_server solved some of the issues
> with connections. From what I've seen a lot of people simply make
> calls to Riak directly from a resource and so I thought having a
> gen_server in front of Riak would help to manage things better.
> Apparently it doesn't.
> So, then, two more questions. I have used connection pools in Java
> like C3P0 and they can ramp up connections and then cull connections
> when there is a period of inactivity. The only pooler I've found that
> does this is: https://github.com/seth/pooler . Do you have any other
> recommendations on connection poolers?
> Second, I'm still a little confused on client ID. I thought client Id
> represented an actual client, not a connection. So, in my case, the
> gen_server is one client which makes multiple connections. After
> seeing what you wrote and reading a bit more on it, it seems like
> client Id should just be some random string (base64 encoded) that
> should be generated on creating a connection. Is that right?
> Thanks for your help!
> On Tue, Jul 26, 2011 at 9:39 AM, Bryan O'Sullivan <bos at mailrank.com> wrote:
>> On Mon, Jul 25, 2011 at 4:03 PM, Andrew Berman <rexxe98 at gmail.com> wrote:
>>> I know that this subject has been brought up before, but I'm still
>>> wondering what the value of a connection pool is with Riak.
>> It's a big deal:
>> It amortises TCP and PBC connection setup overhead over a number of
>> requests, thereby reducing average query latency.
>> It greatly reduces the likelihood that very busy clients and servers will
>> run out of limited resources that are effectively invisible, e.g. closed TCP
>> connections stuck in TIME_WAIT.
>> Each of the above is a pretty big deal. Of course, connection pooling isn't
>> If you have many clients talking to a server sporadically, you may end up
>> with large numbers of open-and-idle connections on a server, which will both
>> consume resources and increase latency for all other clients. This is
>> usually only a problem with a very large number (many thousands) of clients
>> per server, and it usually only arises with poorly written and tuned
>> connection pooling libraries. But ...
>> ... Most connection pooling libraries are poorly written and tuned, so
>> they'll behave pathologically just when you need them not to.
>> Since you don't set up a connection per request, the requests where you *do*
>> need to set up a connection are going to be more expensive than those where
>> you don't, so you'll see jitter in your latency profile. About 99.9% of
>> users will never, ever care about this.
>>> Since Erlang processes are so small and fast to
>>> create, is there really any overhead in having the gen_server create a
>>> new connection (with the same client id) each time it needs to access
>> Of course. The overhead of Erlang processes has nothing to do with the cost
>> of setting up a connection.
>> Also, you really don't want to be using the same client ID repeatedly across
>> different connections. That's an awesome way to cause bugs with vclock
>> resolution that end up being very very hard to diagnose.
> riak-users mailing list
> riak-users at lists.basho.com
More information about the riak-users