Store() Performance

Kev Burns kevburnsjr at gmail.com
Fri Apr 8 14:11:25 EDT 2011


Gui,

I was talking to Andy Gross at a drinkup last month and he told me about a
config var which might help speed up your import.
http://wiki.basho.com/Configuration-Files.html#disable_http_nagle

> disable_http_nagle
>
> When set to true, this option will disable the Nagle buffering algorithm
> for HTTP traffic. This is equivalent to setting the TCP_NODELAY option on
> the HTTP socket. The setting defaults to false. If you experience
> consistent minimum latencies in multiples of 20 milliseconds, setting this
> option to true may reduce latency.
>
So the way I understand it, by default Riak waits for a few ms after
receiving an HTTP request to see if there are any more requests right behind
it so that it can roll them up and processes them as a batch. Given that
you're running 1 import script, you've only got 1 process throwing things
into Riak so disabling this flag may speed you up considerably.

You might also consider breaking the file into chunks and running multiple
import scripts concurrently.

And lastly if you don't already have HAproxy or something similar to
distribute reads/writes evenly across your cluster, that's definitely
something you'll want to set up.

- Kev
c: +001 (650) 521-7791


On Fri, Apr 8, 2011 at 9:32 AM, Gui Pinto <gpinto at chitika.com> wrote:

> Awesome..
> I'll quickly hack out what I need for the quick import, and hope you can
> release your re-write.
>
>
> Gui Pinto
> Software Engineer at Chitika
>
>
> On Fri, Apr 8, 2011 at 12:26 PM, Mark Steele <msteele at beringmedia.com>wrote:
>
>> I've re-written the php library to use keep alives (and various other
>> tweaks). Let me see what I can do about releasing the code.
>>
>> The current php library simply instantiates a new curl instance for each
>> request, making it less than optimal.
>>
>> Mark Steele
>> Bering Media Inc.
>>
>>
>> On Fri, Apr 8, 2011 at 12:23 PM, Gui Pinto <gpinto at chitika.com> wrote:
>>
>>> Hey Everyone, thanks for all of the recommendations.
>>>
>>> I've tried importing using the example load_data script<http://wiki.basho.com/Loading-Data-and-Running-MapReduce-Queries.html>available on the Fast Track, and have last tried the PHP library.
>>>
>>> Both of these execute a straight-foward CURL -X PUT request.. which makes
>>> me think Mark might have just guessed it..
>>> Keep-alive not being used definitely explains the 200-writes/second cap.
>>>
>>> I'm going to take a look into the PHP library and test this theory.
>>>
>>> Gui Pinto
>>> Software Engineer at Chitika
>>>
>>>
>>>
>>> On Fri, Apr 8, 2011 at 10:01 AM, Mark Steele <msteele at beringmedia.com>wrote:
>>>
>>>> If using HTTP, make sure you're using keep-alives. That will be a
>>>> gigantic speed boost.
>>>>
>>>> The protocol buffer API is much faster if you're client language
>>>> supports it.
>>>>
>>>>
>>>> Mark Steele
>>>> Bering Media Inc.
>>>>
>>>>
>>>>
>>>> On Thu, Apr 7, 2011 at 10:58 PM, matthew hawthorne <
>>>> mhawthorne at gmail.com> wrote:
>>>>
>>>>> Hi Gui,
>>>>>
>>>>> I recently pushed 70 million records of size 1K each into a 5-node
>>>>> Riak cluster (which was replicating to another 5-node cluster) at
>>>>> around 1000 writes/second using basho_bench and the REST interface.  I
>>>>> probably could have pushed it further, but I wanted to confirm that it
>>>>> could maintain the load for the entire data set, which it did.
>>>>>
>>>>> My point being that your speed-limit of 200 writes/second is likely
>>>>> specific to your configuration.
>>>>>
>>>>> I wonder:
>>>>> 1) what's your average write latency?
>>>>> 2) how big is your connection pool?
>>>>>
>>>>> Because it's possible that you don't have enough connections available
>>>>> to handle your desired load.
>>>>>
>>>>> -matt
>>>>>
>>>>>
>>>>> On Thu, Apr 7, 2011 at 6:01 PM, Gui Pinto <gpinto at chitika.com> wrote:
>>>>> > Hey guys,
>>>>> > I'm attempting to importing 300M+ objects into a Riak cluster, but
>>>>> have
>>>>> > quickly reached the REST API's speed-limit at 200-store()'s per
>>>>> second..
>>>>> > At the rate of 200/s, I'm looking at 20-days to import this data set!
>>>>> That
>>>>> > can't be the fastest method to do this..
>>>>> >
>>>>> > Any recommendations?
>>>>> >
>>>>> > Thanks!
>>>>> > Gui Pinto
>>>>> >
>>>>> > _______________________________________________
>>>>> > riak-users mailing list
>>>>> > riak-users at lists.basho.com
>>>>> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>> >
>>>>> >
>>>>>
>>>>> _______________________________________________
>>>>> riak-users mailing list
>>>>> riak-users at lists.basho.com
>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>>>
>>>>
>>>>
>>>
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110408/9f57dbb2/attachment.html>


More information about the riak-users mailing list