Random but frequent crashes

Sean Cribbs sean at basho.com
Fri Nov 18 16:01:50 EST 2011


Michael,

The first thing that jumps out to me is the line that says: alarm_handler:
{set,{system_memory_high_watermark,[]}}, followed by a lot of "long_gc"
messages. My guess is that you have too much memory pressure (which can be
a problem with Bitcask when you have lots of keys) and so the Erlang VM
exits when it reaches OOM. On your next node startup, monitor the VSZ of
the "beam.smp" process by using the ps command and make sure it doesn't
reach the maximum RAM in your system.

Hope that helps,

On Fri, Nov 18, 2011 at 3:36 PM, Michael Jakl
<development at semanticlabs.at>wrote:

> So, after a few hours we've got another crash (around 20 o'clock).
> I could not find anything in the logfiles, but I've attached them
> anyways. error.log and crash.log were empty, though.
> Has anyone got an idea how I could possibly get rid of these crashes?
> Cheers,
> Michael
>
> On Fri, Nov 18, 2011 at 12:27 PM, Michael Jakl
> <development at semanticlabs.at> wrote:
> > Hi,
> > I'm testdriving Riak (1.0.1) using Bitcask and a lot of data
> > (currently ~25 million documents). I've deployed Riak on three
> > machines with a n_val of 3 (acutally, I left it at the default).
> >
> > Soon after I started an import process, Riak crashed about every 6
> > million documents (sometimes more frequently) leaving no obvious cause
> > in the logfiles. I've opened a ticket (Bug 1282 [1]), but maybe it's
> > better to discuss it here since I'm not having much information on
> > this. The only node that crashes is the one I'm adding the data to,
> > the other two nodes didn't crash yet. The importer and the (crashing)
> > Riak node are on the same machine and I'm currently using the HTTP
> > Java client (before that, I was using the PBC Java client).
> >
> > It seems that the crashes occur after a long running gc alert in the
> > logfiles, yet that may be unrelated (the memory usage on my machine
> > does not go up).
> >
> > I'm running Riak on machines with 24GB of RAM, the bucket-name is
> > about 10 chars long and the keys 20 chars. I expect about 200 million
> > documents with roughly 20k of data, but currently I've only imported
> > only 24 million. The first crash happened after 6 million documents.
> >
> > The nofile limit for Riak is 32000 on Linux Debian 6 with all updates
> > installed. The capacity planning page tells me that I've enough RAM
> > (recommendation: 3 nodes with 14GB of RAM). The bitcask directory has
> > about 180GB of data and contains 344 files.
> >
> > I've tried switching to eleveldb thinking it might be a memory issue,
> > but that used up more disk space than I have available. My migration
> > plan was to install Riak on another machine, setup leveldb and tried
> > to join the node running bitcask, disk consumption went from 200G to
> > over a terrabyte during the process.
> >
> > I've upgraded to Riak 1.0.2, but the changelog does not mention
> > anything related to that.
> >
> > What could I do to identify the problem? Are there any debugging
> > switches I could turn on (I've recently activated the
> > sasl_error_logger)? I'm thinking of activating the Heartbeat
> > management in vm.args, but that wouldn't fix the root cause... . I've
> > just restarted Riak using 1.0.2 and cleaned all logfiles. Up until
> > now, crashes were frequent enough that I should be able to provide a
> > set of logfiles on monday, but are there any obvious things I might
> > have forgotten?
> >
> > Cheers,
> > Michael
> >
> >  1: https://issues.basho.com/show_bug.cgi?id=1282
> >
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>


-- 
Sean Cribbs <sean at basho.com>
Developer Advocate
Basho Technologies, Inc.
http://www.basho.com/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/ab657f2c/attachment.html>


More information about the riak-users mailing list