Truncated bit-cask files

Matthew Von-Maszewski matthewv at basho.com
Tue Feb 14 11:55:41 EST 2017


Arun,

The AAE code uses leveldb for its storage of anti-entropy data, no matter which backend holds the user data.  Therefore the error below suggests corruption within leveldb files (which is not impossible, but becoming really rare except with bad hardware or full disks).

Before wiping out the AAE directory, you should copy the LOG file within it.  There are likely more useful error messages within that file ... maybe put the file in drop box or zip attach to a reply for us to review.

Matthew

> On Feb 14, 2017, at 10:42 AM, Magnus Kessler <mkessler at basho.com> wrote:
> 
> On 14 February 2017 at 14:46, Arun Rajagopalan <arun.v.rajagopalan at gmail.com <mailto:arun.v.rajagopalan at gmail.com>> wrote:
> Hi Magnus
> 
> RIAK crashes on startup when I have trucated bitcask file
> 
> It also crashes when the AAE files are bad too I think. Example below
> 
> 2017-02-13 21:18:30 =CRASH REPORT====
>   crasher:
>     initial call: riak_kv_index_hashtree:init/1
>     pid: <0.6037.0>
>     registered_name: []
>     exception exit: {{{badmatch,{error,{db_open,"Corruption: truncated record at end of file"}}},[{hashtree,new_segment_
> store,2,[{file,"src/hashtree.erl"},{line,675}]},{hashtree,new,2,[{file,"src/hashtree.erl"},{line,246}]},{riak_kv_index_h
> ashtree,do_new_tree,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,610}]},{lists,foldl,3,[{file,"lists.erl"},{line,124
> 8}]},{riak_kv_index_hashtree,init_trees,3,[{file,"src/riak_kv_index_hashtree.erl"},{line,474}]},{riak_kv_index_hashtree,
> init,1,[{file,"src/riak_kv_index_hashtree.erl"},{line,268}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,304}]}
> ,{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line
> ,328}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
>     ancestors: [<0.715.0>,riak_core_vnode_sup,riak_core_sup,<0.160.0>]
>     messages: []
>     links: []
>     dictionary: []
>     trap_exit: false
>     status: running
>     heap_size: 1598
>     stack_size: 27
>     reductions: 889
>   neighbours:
> 
> 
> Regards
> Arun
> 
> 
> Hi Arun,
> 
> The crash log you provided shows that there is a corrupted file in the AAE (anti_entropy) backend. Entries in console.log should have more information about which partition is affected. Please post output from the affected node at around 2017-02-13T21:18:30. As this is AAE data, it is safe to remove the directory named after the affected partition from the active_entropy directory before restarting the node. You may find that there is more than one affected partition, the next of which will be encountered after the attempted restart only. If this is the case, simply identify the next partition in the same way and remove it, too, until the node starts up successfully again.
> 
> Is there a reason why the nodes aren't shut down in the regular way?
> 
> Kind Regards,
> 
> Magnus
> 
> 
> 
> -- 
> Magnus Kessler
> Client Services Engineer
> Basho Technologies Limited
> 
> Registered Office - 8 Lincoln’s Inn Fields London WC2A 3BP Reg 07970431
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20170214/8c960991/attachment-0002.html>


More information about the riak-users mailing list