Unexpected Riak 1.3 crash

thefosk marco at mashape.com
Thu Apr 11 22:24:44 EDT 2013


OS process not running. I think that the whole cluster crashes because the
other nodes suddenly experience an increased traffic, which makes them crash
as well.

This happened on 1.3.1 also, but for some days now everything seems to be
stable. I guess the main reason why this was happening and may happen again,
is because Riak is taking too much memory from the system. This is the usage
that I experience on a random machine in my cluster, when no M/R jobs are
running:

Cpu(s):  0.2%us,  0.1%sy,  0.0%ni, 99.7%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   7118944k total,  6619652k used,   499292k free,    11304k buffers
Swap:        0k total,        0k used,        0k free,  3173148k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                   
11392 riak      20   0 8917m 3.2g  40m S  2.7 46.5   2080:41 beam.smp 

When M/R jobs are running, almost the whole memory is full. I wonder if
there is a way to tell Riak to use less memory, at the cost of having slower
queries. By the way I also think my cluster is over provisioned. As stated
in the GitHub issue:

The cluster is made of 4 machines, 64 partitions, and n_val=2. Each server
has an average of 60GB of data stored. The machines are EC2 High CPU extra
large instances (c1.xlarge), as such they have:

7 GiB of memory
20 EC2 Compute Units (8 virtual cores with 2.5 EC2 Compute Units each)





--
View this message in context: http://riak-users.197444.n3.nabble.com/Unexpected-Riak-1-3-crash-tp4027359p4027649.html
Sent from the Riak Users mailing list archive at Nabble.com.




More information about the riak-users mailing list