Riak JAVA Client Performance

Brian Roach roach at basho.com
Sun Oct 14 00:48:14 EDT 2012


Some points about how the Java client works:

You use a single instance of the client and share it across threads.

The client holds a connection pool. It grows as necessary. You can
specify the starting size of the pool and the max size (default is
unlimited). There is an idle reaper thread in that connection pool
that evicts connections that are idle for 1 second by default; this is
also something you can change in the config.

As has been mentioned, the best way to deal with a Riak cluster is by
using HAProxy. There is a ClusterClient available in the Java client
that you can instantiate and use that will round-robin requests to
different nodes. That said, unfortunately there are currently a number
of issues that make HAProxy a superior solution. This is something we
plan to address as we further develop the Java client.

Thanks,
Brian Roach


On Thu, Oct 11, 2012 at 3:39 AM, Guido Medina <guido.medina at temetra.com> wrote:
> Hi Pavel,
>
>   I'm not an expert with the pool size, but depending on your average key
> size and nodes you could tune it to your needs, regarding the client, a
> single shared client instance will suffice, there is a retrier parameter
> which says how many times Riak will retry your operation before returning
> you an exception (3 by default), and there is a timeout on acquiring the
> connection, this is an example config:
>
> The pool size here is for 4 nodes cluster kind of guessing for Erlang 8
> threads per node to allow Riak nodes do other things too, remember they have
> to sync their data between the nodes:
>
> host = your balancer host
> port = your balancer port
>
> final PBClientConfig clientConfig=new
> PBClientConfig.Builder().withHost(host).withPort(port).withPoolSize(32).withConnectionTimeoutMillis(5000).build();
> final IRiakClient riakClient=RiakFactory.newClient(clientConfig);
>
> That we have it running with no issues, the pool size depends on your needs
> and data size, you could run with a pool size of 50 to a 100 if your keys
> are really small, you will have to try your own values.
>
> Regards,
>
> Guido.
>
>
> On 11/10/12 08:40, Pavel Kogan wrote:
>
> Thanks Guido, Pawel,
>
> I will try using HAProxy + holding N concurrent connections on the client
> side.
> I want clear for myself some point about concurrent connections:
> 1) What is reasonable limit of concurrent connections?
> 2) Concurrent connections = separate generated pbc clients or single shared
> pbc client?
> 3) Will connection timeout if no requests would be done for some period?
>
> Pavel
>
> On Wed, Oct 10, 2012 at 8:57 PM, Guido Medina <guido.medina at temetra.com>
> wrote:
>>
>> From that perspective, for now it is better to treat the client as you
>> would treat a JDBC DataSource pool, the tidy up comes when connecting the
>> client, either one node or many, the client will behave better if it has no
>> knowledge of whats going on at the cluster side, of course, that's as of
>> 1.0.6, so that might change.
>>
>> He could try to connect to one node with a pool from 8 to 16 concurrent
>> connections and start from there, then, when talking to a cluster, he needs
>> the balancer in the middle, main reason is because Riak expect you to
>> connect to all nodes (it will simply behave better), otherwise it will be
>> overloaded at one node and give you IOExceptions from time to time.
>>
>> Hope that helps,
>>
>> Guido.
>>
>>
>> On 10/10/12 19:24, kamiseq wrote:
>>>
>>> ok, you have 100% point here, on the other hand I think pavel looks
>>> for some guidance how to improve performance on client side, so he can
>>> be 100% sure he is not wasting time on something. this is maybe
>>> premature optimization but it maybe also good position to understand
>>> library and enter new world of riak
>>>
>>> pozdrawiam
>>> Paweł Kamiński
>>>
>>> kamiseq at gmail.com
>>> pkaminski.prv at gmail.com
>>> ______________________
>>>
>>>
>>> On 10 October 2012 17:30, Guido Medina <guido.medina at temetra.com> wrote:
>>>>
>>>> In fact, as more nodes, you might be surprised it that it might be
>>>> faster....see my point? Riak is a lot of things, 1st you have to be
>>>> aware of
>>>> the hashing, hashmap, how a key gets copied into different nodes, how
>>>> one or
>>>> more nodes are responsible for a key, etc...so it is not that simple.
>>>>
>>>>
>>>> On 10/10/12 16:28, Guido Medina wrote:
>>>>
>>>> That's why I keep pushing to one answer, Riak is not meant to be in one
>>>> cluster, you are removing the external factors and CAP settings you will
>>>> be
>>>> using, and it won't be linear, you could get the same results with RW=2
>>>> with
>>>> 3, 4 and 5 nodes, there are several factors that will influence your
>>>> benchmark, I would start with 3 nodes, up to 5 by altering those
>>>> numbers,
>>>> then you could end up with a formula which I asure you, it won't be
>>>> linear.
>>>>
>>>> Regards,
>>>>
>>>> Guido.
>>>>
>>>> On 10/10/12 16:19, Pavel Kogan wrote:
>>>>
>>>> I understand that load balancing is a final solution, but I want to
>>>> benchmark single node.
>>>> If I knew that I can load single node with N requests / sec, I could
>>>> assume
>>>> that after load balancing over 5 nodes my throughput limit will increase
>>>> linearly.
>>>>
>>>> Pavel
>>>>
>>>> On Wed, Oct 10, 2012 at 2:51 PM, Guido Medina <guido.medina at temetra.com>
>>>> wrote:
>>>>>
>>>>> The answer is there, create a client config with N pooled connections
>>>>> to
>>>>> your load balancer whatever you are using, I know HA proxy supports the
>>>>> PBC
>>>>> config (TCP based) which is faster than HTTP client, and hence my
>>>>> recommendation.
>>>>>
>>>>> Say, a non-clustered client config with N connections to balancer_host
>>>>> at
>>>>> 8087 and your balancer_host connected to EACH node, that's the way to
>>>>> go,
>>>>> the rest is about the CAP level you want to support which will impact
>>>>> your
>>>>> performance vs integrity. Up to you.
>>>>>
>>>>> CAP doc:
>>>>>
>>>>> http://docs.basho.com/riak/latest/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/
>>>>>
>>>>> Guido.
>>>>>
>>>>>
>>>>> On 10/10/12 13:33, Pavel Kogan wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> The node is OK and not down.
>>>>> I have a way to do load balancing externally to JAVA Client.
>>>>> I am evaluating Riak for using in my company and want to measure
>>>>> maximal
>>>>> throughput vs single node.
>>>>>
>>>>> Thanks,
>>>>>     Pavel
>>>>>
>>>>> On Wed, Oct 10, 2012 at 2:13 PM, Guido Medina
>>>>> <guido.medina at temetra.com>
>>>>> wrote:
>>>>>>
>>>>>> That question has been answered few times, here is my old answer:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>    It is the Java client which to be honest, doesn't handle well one
>>>>>> node
>>>>>> going down, so, for example, in my company we use HA proxy for that,
>>>>>> here
>>>>>> is
>>>>>> a starting configuration: https://gist.github.com/1507077
>>>>>>
>>>>>>    Once we switched to HA proxy we just use a simple client without
>>>>>> cluster
>>>>>> config, so the Java client doesn't know anything about the load
>>>>>> balancing
>>>>>> going on. It works well, I can upgrade and restart servers without our
>>>>>> Java
>>>>>> application be complaining.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Guido.
>>>>>>
>>>>>>
>>>>>> On 10/10/12 12:58, Pavel Kogan wrote:
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> I will try this solution.
>>>>>>
>>>>>> Pavel
>>>>>>
>>>>>> On Wed, Oct 10, 2012 at 1:51 PM, kamiseq <kamiseq at gmail.com> wrote:
>>>>>>>
>>>>>>> well I asked same question few days ago (maybe 2 weeks form now) and
>>>>>>> the answer was that yes sharing client is thread safe and all you
>>>>>>> should do is to create new bucket instance on every request
>>>>>>>
>>>>>>> pozdrawiam
>>>>>>> Paweł Kamiński
>>>>>>>
>>>>>>> kamiseq at gmail.com
>>>>>>> pkaminski.prv at gmail.com
>>>>>>> ______________________
>>>>>>>
>>>>>>>
>>>>>>> On 10 October 2012 09:25, Pavel Kogan <pavel.kogan at cortica.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> 1) Is it ok to share a single pbc client object between 50 threads?
>>>>>>>> Should
>>>>>>>> it be protected by lock ?
>>>>>>>> 2) I didn't do load balancing between nodes yet, cause I want to
>>>>>>>> understand
>>>>>>>> better throughput limit. I am planning to do it for much higher
>>>>>>>> throughput.
>>>>>>>>
>>>>>>>> Pavel
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Oct 10, 2012 at 9:21 AM, kamiseq <kamiseq at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> maybe the good start is to share pbclient object and only create
>>>>>>>>> bucket per request, you will save few steps on client
>>>>>>>>> configuration.
>>>>>>>>> have you tried balancing requests to cluster and distribute them
>>>>>>>>> over
>>>>>>>>> all
>>>>>>>>> nodes?
>>>>>>>>>
>>>>>>>>> pozdrawiam
>>>>>>>>> Paweł Kamiński
>>>>>>>>>
>>>>>>>>> kamiseq at gmail.com
>>>>>>>>> pkaminski.prv at gmail.com
>>>>>>>>> ______________________
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 10 October 2012 06:18, Pavel Kogan <pavel.kogan at cortica.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have Riak cluster consisting of 5 nodes that contains about 30
>>>>>>>>>> millions of
>>>>>>>>>> keys (35% of capacity according to Riak Control).
>>>>>>>>>> Currently we have single JAVA client reading and writing records
>>>>>>>>>> to
>>>>>>>>>> same
>>>>>>>>>> node. I need some tips, how to use the client efficiently
>>>>>>>>>> to reach maximal throughput - I would like to be able to
>>>>>>>>>> read/write
>>>>>>>>>> up
>>>>>>>>>> to
>>>>>>>>>> 100 records/sec on 1Gbit network. Currently I get a lot
>>>>>>>>>> of JAVA socket exceptions after a while (even for the much slower
>>>>>>>>>> rate -
>>>>>>>>>> 10
>>>>>>>>>> records/sec), after which I  need to restart client and node.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>     Pavel
>>>>>>>>>>
>>>>>>>>>> P.S: My client using 50 threads and pbc client is created and
>>>>>>>>>> shut-downed
>>>>>>>>>> per request.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> riak-users mailing list
>>>>>>>>>> riak-users at lists.basho.com
>>>>>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> riak-users mailing list
>>>>>> riak-users at lists.basho.com
>>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> riak-users mailing list
>>>> riak-users at lists.basho.com
>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list