Corrupted Erlang binary term inside LevelDB

Vladimir Shabanov vshabanoff at
Wed Jul 24 21:39:56 EDT 2013


Recently I've started expanding my Riak cluster and found that handoffs
were continuously retried for one partition.

Here are logs from two nodes

The most interesting parts of logs are
"Handoff receiver for partition ... exited abnormally after processing
2860338 objects: {{badarg,[{erlang,binary_to_term,..."
"bad argument in call to erlang:binary_to_term(<<131,104,...."

Both nodes are running Riak 1.3.2 (old one was running 1.3.1 previously).

When I've printed corrupted binary string I found that it corresponds to
one value.

When I've tried to "get" it, it was read OK but node with corrupted value
shown the same binary_to_term error.

When I've tried to delete corrupted value I've got timeout.

I'm running machines with ECC memory and ZFS filesystem (which doesn't
report any checksum failures) so I doubt data was silently corrupted on

LOG from corresponding LevelDB partition doesn't show any errors. But there
is a lost/BLOCKS.bad file in this partition (7kb, created more than a month
ago and looks like it doesn't contain corrupted value).

At the moment I've stopped handoffs using "risk-admin transfer-limit 0".

Why the value was corrupted? It there any way to remove it or fix it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the riak-users mailing list