Riak performance problems when LevelDB database grows beyond 16GB

Evan Vigil-McClanahan emcclanahan at basho.com
Thu Oct 11 16:36:51 EDT 2012


Can you attach the eleveldb portion of your app.config file?
Configuration problems, especially max_open_files being too low, can
often cause issues like this.

If it isn't sensitive, the whole app.config and vm.args files are also
often helpful.

On Thu, Oct 11, 2012 at 9:12 AM,  <Jan.Evangelista at seznam.cz> wrote:
> Hello,
>
> I am writing a new application and I am testing it on a cluster with 4 Riak
> nodes (16 GM RAM, 2 x i3 3.4GHz - 2 cores).
>
> The application is tested with the expected load of 1000 requests/second,
> 90% of the requests cause a Riak read and  write of a new key. The problem
> is that the performance starts falling after 18-20 hours and one of the Riak
> nodes stops responding after 23-25 hours.
>
> (Key is cca 61 bytes long, data is just 3 timestamps converted to binary,
> and there is a secondary key containing an expiration time. There should be
> a mapred job to delete keys older than 24 hours, but it is turned off while
> researching the performance problem.)
>
> Logs on the other nodes show that the problematic node cannot be contacted:
>
> 2012-10-11 11:33:57.473 [error] <0.908.0> ** Node 'riak at 172.16.0.2' not
> responding **
> ** Removing (timedout) connection **
>
> The problematic node itself does not respond to "/usr/sbin/riak ping", but
> beam.smp is running and ALIVE messages are written regularly to the erlang
> log. There is nothing suspicious in logs on the node,  its error log is
> empty.
>
> The beam.smp consumes 20% memory and 50-100% of 1 CPU (the other 3 CPUs sit
> idle),  and the process has 267 open LevelDB files.
>
> The database sizes are:
>
> node1: 16249M, 281 files in 21 dirs (with 4 additional files like
> /home/riak/leveldb/0/lost/BLOCKS.bad); this is the problematic node
> node2: 16183M, 264 files in 16 dirs
> node3: 16664M, 264 files in 16 dirs
> node4: 16205M, 265 files in 16 dirs
>
> I tried to attach to the beam.smp process with Erlang, but it does not
> respond to net_adm:ping/1.
>
> I attached gdb to the process, and gdb shows that most of its 93 threads are
> idle (in ethr_event_wait), but 2 threads are in LevelDB code:
>
> Thread 24 (Thread 0x7f1a8ecc0700 (LWP 3912)):
> #0  0x00007f1a91f74d84 in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #1  0x00007f1a0ee0ae9d in leveldb::port::CondVar::Wait() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #2  0x00007f1a0ede3841 in leveldb::DBImpl::MakeRoomForWrite(bool) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #3  0x00007f1a0ede91ad in leveldb::DBImpl::Write(leveldb::WriteOptions
> const&, leveldb::WriteBatch*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #4  0x00007f1a0eddeca4 in eleveldb_write () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #5  0x0000000000534c16 in process_main ()
> #6  0x00000000004987e3 in ?? ()
> #7  0x0000000000595320 in ?? ()
> #8  0x00007f1a91f70e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #9  0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #10 0x0000000000000000 in ?? ()
>
> Thread 20 (Thread 0x7f19fc727700 (LWP 3967)):
> #0  0x00007f1a0ee05a67 in leveldb::crc32c::Extend(unsigned int, char const*,
> unsigned long) () from /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #1  0x00007f1a0ee012b9 in
> leveldb::TableBuilder::WriteRawBlock(leveldb::Slice const&,
> leveldb::CompressionType, leveldb::BlockHandle*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #2  0x00007f1a0ee01444 in
> leveldb::TableBuilder::WriteBlock(leveldb::BlockBuilder*,
> leveldb::BlockHandle*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #3  0x00007f1a0ee015e4 in leveldb::TableBuilder::Flush() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #4  0x00007f1a0ee0178b in leveldb::TableBuilder::Add(leveldb::Slice const&,
> leveldb::Slice const&) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #5  0x00007f1a0ede7cad in
> leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #6  0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #7  0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #8  0x00007f1a0ee06c1e in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #9  0x00007f1a91f70e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #10 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #11 0x0000000000000000 in ?? ()
>
> When I looked at thread 20 in the process again, the stack has shown some
> Snappy compressions, and many later inspections have shown call to
> fdatasync(2),
> which was replaced by some more compaction work. Thread 24 still sits in
> leveldb::DBImpl::MakeRoomForWrite.
>
> Thread 20 samples:
> #0  0x00007f1a0ee0ed6d in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #1  0x00007f1a0ee0edb3 in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #2  0x00007f1a0ee0f9dc in snappy::internal::CompressFragment(char const*,
> unsigned long, char*, unsigned short*, int) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #3  0x00007f1a0ee10dc1 in snappy::Compress(snappy::Source*, snappy::Sink*)
> () from /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #4  0x00007f1a0ee1115a in snappy::RawCompress(char const*, unsigned long,
> char*, unsigned long*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #5  0x00007f1a0ee014eb in
> leveldb::TableBuilder::WriteBlock(leveldb::BlockBuilder*,
> leveldb::BlockHandle*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #6  0x00007f1a0ee015e4 in leveldb::TableBuilder::Flush() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #7  0x00007f1a0ee0178b in leveldb::TableBuilder::Add(leveldb::Slice const&,
> leveldb::Slice const&) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #8  0x00007f1a0ede7cad in
> leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #9  0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #10 0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #11 0x00007f1a0ee06c1e in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #12 0x00007f1a91f70e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #13 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #14 0x0000000000000000 in ?? ()
>
> #0  0x00007f1a91a8fa5d in fdatasync () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f1a0ee08d64 in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #2  0x00007f1a0ede3357 in
> leveldb::DBImpl::FinishCompactionOutputFile(leveldb::DBImpl::CompactionState*,
> leveldb::Iterator*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #3  0x00007f1a0ede7e6e in
> leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #4  0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #5  0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #6  0x00007f1a0ee06c1e in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #7  0x00007f1a91f70e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #8  0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #9  0x0000000000000000 in ?? ()
>
> #0  0x00007f1a0ee05765 in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #1  0x00007f1a0ee0b6da in
> leveldb::InternalKeyComparator::Compare(leveldb::Slice const&,
> leveldb::Slice const&) const () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #2  0x00007f1a0ee00218 in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #3  0x00007f1a0ee006aa in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #4  0x00007f1a0ede7ccd in
> leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #5  0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #6  0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #7  0x00007f1a0ee06c1e in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #8  0x00007f1a91f70e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #9  0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #10 0x0000000000000000 in ?? ()
>
> #0  0x00007f1a0eb61dcb in std::string::_M_mutate(unsigned long, unsigned
> long, unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #1  0x00007f1a0eb61e1c in std::string::_M_replace_safe(unsigned long,
> unsigned long, char const*, unsigned long) () from
> /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #2  0x00007f1a0ee03559 in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #3  0x00007f1a0ee037bd in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #4  0x00007f1a0ee00680 in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #5  0x00007f1a0ede7ccd in
> leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #6  0x00007f1a0ede8456 in leveldb::DBImpl::BackgroundCompaction() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #7  0x00007f1a0ede9038 in leveldb::DBImpl::BackgroundCall() () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #8  0x00007f1a0ee06c1e in ?? () from
> /usr/lib/riak/lib/eleveldb-1.2.2p5/priv/eleveldb.so
> #9  0x00007f1a91f70e9a in start_thread () from
> /lib/x86_64-linux-gnu/libpthread.so.0
> #10 0x00007f1a91a964bd in clone () from /lib/x86_64-linux-gnu/libc.so.6
> #11 0x0000000000000000 in ?? ()
>
> Software used:
>
> OS: Ubuntu 12.04 LTS, amd64
> Riak: riak_1.2.1rc2, installed from the Basho-provided deb package
> client accesses Riak via riak-erlang-client 1.2.1
>
> Any hints?
>
> Thanks, Jan
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>




More information about the riak-users mailing list