Constant vnode crashes after disk corruption

Nico Meyer nico.meyer at adition.com
Wed Apr 18 06:26:47 EDT 2012


Oh, I forgot to mention:

My workaround was to patch riak_kv_bitcask_backend to map all errors to 
{error,not_found}. Which begs the question if the 'get/3' function of 
any backend should ever return anything other than
{ok, Value, State} and {error, not_found, State} if it isn't handled by 
riak_kv_vnode.

BTW: I think the -spec() for get/3 is wrong both in 
riak_kv_bitcask_backend and riak_kv_eleveldb_backend. It states a 
possible return value of the form '{ok, not_found, state()}' for the 
not_found case, instead of the actually returned form '{error, 
not_found, state()}'

Cheers,
Nico

Am 18.04.2012 12:18, schrieb Nico Meyer:
> Hello,
>
> I just encountered a problem with one of our Riak nodes, which is 
> caused by a bug in either the disk controller or the firmware of our 
> SSD disks.
> Anyway, the obvious symptom is, that all writes to the disks suddenly 
> fail, which of course leads to truncated bitcask files. However, this 
> time the files got corrupted in a way, that lead to CRC errors while 
> fetching keys from bitcask. This in turn leads to a crash of the vnode 
> everytime such a key is read. So the log is filled with these messages:
>
> 11:55:52.621 [error] CRASH REPORT Process <0.23175.3> with 0 
> neighbours crashed with reason: no case clause matching 
> {error,bad_crc,{state,#Ref<0.0.0.196598>,"262613575457896618114724618378707105094425378816",[{async_folds,true},[{vnode_vclocks,false},{included_applications,[]},{allow_strfun,false},{reduce_js_vm_count,6},{storage_backend,riak_kv_bitcask_backend},{legacy_keylisting,false},{pb_ip,"0.0.0.0"},{hook_js_vm_count,2},{listkeys_backpressure,false},{mapred_name,"mapred"},{stats_urlpath,"stats"},{legacy_stats,true},{js_thread_stack,16},{riak_kv_stat,true},{add_paths,[]},{http_url_encoding,on},{map_js_vm_count,...},...],...],...}} 
> in riak_kv_vnode:prepare_put/3
>
> Also those keys cannot be (over)written, since a put without 
> last_write_wins set to true does a get first internally.
> I think the cause of the error should be obvious to anyone familiar 
> with the riak internals. Otherwise I can provide more information.
>
> Cheers,
> Nico
>


-- 
Senior Backend Developer
________________________________________________

ADITION technologies AG
Schwarzwaldstrasse 78b
79117 Freiburg

http://www.adition.com

T +49 / (0)761 / 88147 - 30
F +49 / (0)761 / 88147 - 77
SUPPORT +49  / (0)1805 - ADITION

(Festnetzpreis 14 ct/min; Mobilfunkpreise maximal 42 ct/min)

Eingetragen beim Amtsgericht Düsseldorf unter HRB 54076
Vorstände: Andreas Kleiser, Jörg Klekamp, Tihomir Perkovic, Marcus Schlüter
Aufsichtsratsvorsitzender: Rechtsanwalt Daniel Raimer
UStIDNr.: DE 218 858 434





More information about the riak-users mailing list