Random but frequent crashes

Michael Jakl development at semanticlabs.at
Fri Nov 18 15:36:17 EST 2011


So, after a few hours we've got another crash (around 20 o'clock).
I could not find anything in the logfiles, but I've attached them
anyways. error.log and crash.log were empty, though.
Has anyone got an idea how I could possibly get rid of these crashes?
Cheers,
Michael

On Fri, Nov 18, 2011 at 12:27 PM, Michael Jakl
<development at semanticlabs.at> wrote:
> Hi,
> I'm testdriving Riak (1.0.1) using Bitcask and a lot of data
> (currently ~25 million documents). I've deployed Riak on three
> machines with a n_val of 3 (acutally, I left it at the default).
>
> Soon after I started an import process, Riak crashed about every 6
> million documents (sometimes more frequently) leaving no obvious cause
> in the logfiles. I've opened a ticket (Bug 1282 [1]), but maybe it's
> better to discuss it here since I'm not having much information on
> this. The only node that crashes is the one I'm adding the data to,
> the other two nodes didn't crash yet. The importer and the (crashing)
> Riak node are on the same machine and I'm currently using the HTTP
> Java client (before that, I was using the PBC Java client).
>
> It seems that the crashes occur after a long running gc alert in the
> logfiles, yet that may be unrelated (the memory usage on my machine
> does not go up).
>
> I'm running Riak on machines with 24GB of RAM, the bucket-name is
> about 10 chars long and the keys 20 chars. I expect about 200 million
> documents with roughly 20k of data, but currently I've only imported
> only 24 million. The first crash happened after 6 million documents.
>
> The nofile limit for Riak is 32000 on Linux Debian 6 with all updates
> installed. The capacity planning page tells me that I've enough RAM
> (recommendation: 3 nodes with 14GB of RAM). The bitcask directory has
> about 180GB of data and contains 344 files.
>
> I've tried switching to eleveldb thinking it might be a memory issue,
> but that used up more disk space than I have available. My migration
> plan was to install Riak on another machine, setup leveldb and tried
> to join the node running bitcask, disk consumption went from 200G to
> over a terrabyte during the process.
>
> I've upgraded to Riak 1.0.2, but the changelog does not mention
> anything related to that.
>
> What could I do to identify the problem? Are there any debugging
> switches I could turn on (I've recently activated the
> sasl_error_logger)? I'm thinking of activating the Heartbeat
> management in vm.args, but that wouldn't fix the root cause... . I've
> just restarted Riak using 1.0.2 and cleaned all logfiles. Up until
> now, crashes were frequent enough that I should be able to provide a
> set of logfiles on monday, but are there any obvious things I might
> have forgotten?
>
> Cheers,
> Michael
>
>  1: https://issues.basho.com/show_bug.cgi?id=1282
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: console.log
Type: application/octet-stream
Size: 38726 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/e320aa77/attachment.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: erlang.log.1
Type: application/octet-stream
Size: 33168 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/e320aa77/attachment.1>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: run_erl.log
Type: application/octet-stream
Size: 255 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/e320aa77/attachment-0001.log>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: sasl.log
Type: application/octet-stream
Size: 30447 bytes
Desc: not available
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/e320aa77/attachment-0002.log>


More information about the riak-users mailing list