Search precommit hook bug?

Elias Levy fearsome.lucidity at gmail.com
Tue Sep 20 14:41:43 EDT 2011


On Tue, Sep 20, 2011 at 7:53 AM, Ryan Zezeski <rzezeski at basho.com> wrote:

> Elias,
>
> It's hard to say from just this one stacktrace but it seems that the
> vnode/leveldb backend might be failing under load causing the R value to go
> unmet.  The Search hook has to perform a read of a special object it stores
> in the backend and that's what is failing here.  However, the root cause
> seems to be vnodes failing.  I say this because of the presence of the
> `{r_val_unsatisfied,2,1}` msg.  Could you check the error and crash log
> files and see if you can't find other traces that might shed more light on
> this?
>

Alas, I upgraded to 1.0.0pre4 and no longer observe the behavior.  Before
that I verified that the problem also occurred when using Bitcask, so it
seemed not related to the backend in use.

That said, now I am seeing a different error in 1.0.0pre4. I am using the
same set up (3 nodes, 1 client with 12 concurrent PB connections spread
across the nodes, inserting data as fast as it can).

This error is a lot rarer.  I have to insert several hundred or million
objects before it manifests itself, although I've seen it happen once soon
after starting the load script.

The error occurs within one of the nodes and causes the node to go into a
tight loop.  The node will not respond to a "riak stop" command.  I usually
have to kill the riak processes.

The node loops generating the following error:

2011-09-20 01:06:45.819 [error] <0.107.0>
{mochiweb_socket_server,310,{acceptor_error,{error,accept_failed}}}
2011-09-20 01:06:45.820 [error] <0.13978.273> application: mochiweb, "Accept
failed error", "{error,emfile}


The long preamble of errors leading to the loop can be seen at
http://pastebin.com/4Eu2UMYf

I particularly found the following puzzling:

2011-09-20 00:43:39.144 [error] <0.13420.273> CRASH REPORT Process [] with 0
neighbours crashed with reason: {error,{badmatch,{error,emfile}}}

Notice that the process list is empty.

Now from what I've been able to find,{error,emfile} usually means you are
out of file descriptors.  Yes?

If so, the system is running with a fd ulimit of 4096.  Is that not
considered sufficient?  Again, this is a single client with only 12
concurrent connections.  I am using the leveldb backend, if that makes a
difference.

Could there be a fd leak somewhere within Riak? Maybe the new eleveldb
backend?

Is there some command to show how many file descriptors are in use while the
node is running?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20110920/6c6ddea2/attachment.html>


More information about the riak-users mailing list