Problems writing objects to an half full bucket

Marco Monteiro marco at textovirtual.com
Wed Mar 7 15:31:50 EST 2012


Having the keys prefixed with the seconds since epoch solved the problem.

Thanks,
Marco

On 6 March 2012 15:47, Marco Monteiro <marco at textovirtual.com> wrote:

> It makes sense, David. I'm going to give it a try.
> Hopefully this will make it usable for the next month
> until the issue is addressed.
>
> I'll let you know how it goes.
>
> Thanks,
> Marco
>
>
> On 6 March 2012 15:19, David Smith <dizzyd at basho.com> wrote:
>
>> On Mon, Mar 5, 2012 at 9:55 PM, Marco Monteiro <marco at textovirtual.com>
>> wrote:
>>
>> > I'm using riak-js and the error I get is:
>> >
>> > { [Error: socket hang up] code: 'ECONNRESET' }
>>
>> That is a strange error -- are there any corresponding errors in
>> server logs? I would have expected a timeout or some such...
>>
>> >
>> > UUIDs. They are created by Riak. All my queries use 2i. The 2i are
>> integers
>> > (representing seconds) and random strings (length 16) used as
>> identifiers
>> > for user sessions and similar.
>>
>> So, this explains why the problem goes away when you switch to an
>> empty bucket. A bit of background...
>>
>> If you're using the functionality in Riak that automatically generates
>> a UUID on PUT, you're going to get a uniformly distributed 160-bit
>> number (since the implementation SHA-1 hashes the input). This sort of
>> distribution is great for uniqueness, since there is a 1 in 2^160
>> chance (roughly) that you will encounter another similar ID. It can be
>> very bad from a caching perspective, however, if you have a cache that
>> uses pages of information for locality purposes. In a scheme such as
>> this (which is what LevelDB uses), the system will wind up churning
>> the cache constantly since the odds are quite low that the next UUID
>> to be accessed will be already in memory (remember, uniform
>> distribution of keys).
>>
>> LevelDB also makes this pathological case a bit worse by not having
>> bloom filters -- when inserting a new UUID, you will potentially have
>> to do 7 disk seeks just to determine if the UUID is not present. The
>> Google team is working to address this problem, but I'm guessing it'll
>> be a month or so before that's done and then we have to integrate with
>> Riak -- so we can't count on that just yet.
>>
>> Now, all is not lost. :)
>>
>> If you craft your keys so that there is some temporal locality _and_
>> the access pattern of your keys has some sort of exponential-ish
>> decay, you can still get very good performance out of LevelDB. One
>> simple way to do this is to prefix the current date-time on front of
>> the UUID, like so:
>>
>> 201203060806-<uuid> (YMDhm-UUID)
>>
>> You could also use seconds since the epoch, etc. This has the effect
>> of keeping recently accessed/hot UUIDs on (close to) the same cache
>> page, and lets you avoid a lot of cache churn and typically
>> dramatically improves LevelDB performance.
>>
>> Does this help/make sense?
>>
>> D.
>> --
>> Dave Smith
>> VP, Engineering
>> Basho Technologies, Inc.
>> dizzyd at basho.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120307/cb87be8a/attachment.html>


More information about the riak-users mailing list