Corrupted Erlang binary term inside LevelDB

Matthew Von-Maszewski matthewv at basho.com
Thu Jul 25 14:01:44 EDT 2013


Vladimir,

I can explain what happened, but not how to correct the problem.  The gentleman that can walk you through a repair is tied up on another project, but he intends to respond as soon as he is able.

We recently discovered / realized that Google's leveldb code does not check the CRC of each block rewritten during a compaction.  This means that blocks with bad CRCs get read without being flagged as bad, then rewritten to a new file with a new, valid CRC.  The corruption is now hidden.

A more thorough discussion of the problem is found here:

https://github.com/basho/leveldb/wiki/mv-verify-compactions


We added code to the 1.3.2 and 1.4 Riak releases to have the block CRC checked during both read (Get) requests and compaction rewrites.  This prevents future corruption hiding.  Unfortunately, it does NOTHING for blocks already corrupted and rewritten with valid CRCs.  You are encountering this latter condition.  We have a developer advocate / client services person that has walked others through a fix via the Riak data replicas … 

… please hold and the doctor will be with you shortly.

Matthew


On Jul 24, 2013, at 9:39 PM, Vladimir Shabanov <vshabanoff at gmail.com> wrote:

> Hello,
> 
> Recently I've started expanding my Riak cluster and found that handoffs were continuously retried for one partition.
> 
> Here are logs from two nodes
> https://gist.github.com/vshabanov/41282e622479fbe81974
> 
> The most interesting parts of logs are
> "Handoff receiver for partition ... exited abnormally after processing 2860338 objects: {{badarg,[{erlang,binary_to_term,..."
> and
> "bad argument in call to erlang:binary_to_term(<<131,104,...."
> 
> Both nodes are running Riak 1.3.2 (old one was running 1.3.1 previously).
> 
> 
> When I've printed corrupted binary string I found that it corresponds to one value.
> 
> When I've tried to "get" it, it was read OK but node with corrupted value shown the same binary_to_term error.
> 
> When I've tried to delete corrupted value I've got timeout.
> 
> 
> I'm running machines with ECC memory and ZFS filesystem (which doesn't report any checksum failures) so I doubt data was silently corrupted on disk.
> 
> LOG from corresponding LevelDB partition doesn't show any errors. But there is a lost/BLOCKS.bad file in this partition (7kb, created more than a month ago and looks like it doesn't contain corrupted value).
> 
> At the moment I've stopped handoffs using "risk-admin transfer-limit 0".
> 
> Why the value was corrupted? It there any way to remove it or fix it?
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130725/a2f4fdcb/attachment.html>


More information about the riak-users mailing list