Problems writing objects to an half full bucket

Marco Monteiro marco at textovirtual.com
Tue Mar 6 10:47:41 EST 2012


It makes sense, David. I'm going to give it a try.
Hopefully this will make it usable for the next month
until the issue is addressed.

I'll let you know how it goes.

Thanks,
Marco

On 6 March 2012 15:19, David Smith <dizzyd at basho.com> wrote:

> On Mon, Mar 5, 2012 at 9:55 PM, Marco Monteiro <marco at textovirtual.com>
> wrote:
>
> > I'm using riak-js and the error I get is:
> >
> > { [Error: socket hang up] code: 'ECONNRESET' }
>
> That is a strange error -- are there any corresponding errors in
> server logs? I would have expected a timeout or some such...
>
> >
> > UUIDs. They are created by Riak. All my queries use 2i. The 2i are
> integers
> > (representing seconds) and random strings (length 16) used as identifiers
> > for user sessions and similar.
>
> So, this explains why the problem goes away when you switch to an
> empty bucket. A bit of background...
>
> If you're using the functionality in Riak that automatically generates
> a UUID on PUT, you're going to get a uniformly distributed 160-bit
> number (since the implementation SHA-1 hashes the input). This sort of
> distribution is great for uniqueness, since there is a 1 in 2^160
> chance (roughly) that you will encounter another similar ID. It can be
> very bad from a caching perspective, however, if you have a cache that
> uses pages of information for locality purposes. In a scheme such as
> this (which is what LevelDB uses), the system will wind up churning
> the cache constantly since the odds are quite low that the next UUID
> to be accessed will be already in memory (remember, uniform
> distribution of keys).
>
> LevelDB also makes this pathological case a bit worse by not having
> bloom filters -- when inserting a new UUID, you will potentially have
> to do 7 disk seeks just to determine if the UUID is not present. The
> Google team is working to address this problem, but I'm guessing it'll
> be a month or so before that's done and then we have to integrate with
> Riak -- so we can't count on that just yet.
>
> Now, all is not lost. :)
>
> If you craft your keys so that there is some temporal locality _and_
> the access pattern of your keys has some sort of exponential-ish
> decay, you can still get very good performance out of LevelDB. One
> simple way to do this is to prefix the current date-time on front of
> the UUID, like so:
>
> 201203060806-<uuid> (YMDhm-UUID)
>
> You could also use seconds since the epoch, etc. This has the effect
> of keeping recently accessed/hot UUIDs on (close to) the same cache
> page, and lets you avoid a lot of cache churn and typically
> dramatically improves LevelDB performance.
>
> Does this help/make sense?
>
> D.
> --
> Dave Smith
> VP, Engineering
> Basho Technologies, Inc.
> dizzyd at basho.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120306/c7acf3f9/attachment.html>


More information about the riak-users mailing list