Constant vnode crashes after disk corruption

Nico Meyer nico.meyer at adition.com
Thu Apr 19 06:30:51 EDT 2012


Hi Bogunov!

Simple truncation of the bitcask files won't trigger this error, since 
bitcask will notice that the last written entry is truncated and ignore 
it. In this case a 'not found' is returned to the layer above bitcask. 
If on the other hand, an entry (not necessarily the last one written) 
has the right length but the checksum that bitcask writes with each 
entry does not match this error is returned as such. The layer above 
bitcask (riak_kv_vnode) doesn't handle this case, and therefore chrashes.
Of course a checksum error in the middle of the file means the file is 
corrupted. But if the only way to resolve the problem, is to delete the 
whole file, bitcask might a well pretend the key was not found (an maybe 
delete it internally). That way at least the rest of the file might be 
still usable.

I think what happend in my case, is that the file had the right length 
to fully contain the last entry, but the data was not fully written. 
This is what you get and rightly deserve for using ext4 as the 
filesystem :-(.

But still I would think chrashing the vnode if the bitcask files are 
corrupted is always the wrong behaviour. At the very least an error 
should be returned to the node performing the get, to fail fast in the 
case where R is set to N. Otherwise the request hangs until the timeout 
is reached, wich is 60 second by default.

Cheers,
Nico

Am 19.04.2012 11:19, schrieb Bogunov:
> Actually you get same error if you try to copy bitcask directory while 
> writing in it, so i assume any not completely-written bitcask file can 
> cause it. Easy way looks like dropping bitcask directory .
>
> On Wed, Apr 18, 2012 at 2:26 PM, Nico Meyer <nico.meyer at adition.com 
> <mailto:nico.meyer at adition.com>> wrote:
>
>     Oh, I forgot to mention:
>
>     My workaround was to patch riak_kv_bitcask_backend to map all
>     errors to {error,not_found}. Which begs the question if the
>     'get/3' function of any backend should ever return anything other than
>     {ok, Value, State} and {error, not_found, State} if it isn't
>     handled by riak_kv_vnode.
>
>     BTW: I think the -spec() for get/3 is wrong both in
>     riak_kv_bitcask_backend and riak_kv_eleveldb_backend. It states a
>     possible return value of the form '{ok, not_found, state()}' for
>     the not_found case, instead of the actually returned form '{error,
>     not_found, state()}'
>
>     Cheers,
>     Nico
>
>     Am 18.04.2012 12:18, schrieb Nico Meyer:
>
>         Hello,
>
>         I just encountered a problem with one of our Riak nodes, which
>         is caused by a bug in either the disk controller or the
>         firmware of our SSD disks.
>         Anyway, the obvious symptom is, that all writes to the disks
>         suddenly fail, which of course leads to truncated bitcask
>         files. However, this time the files got corrupted in a way,
>         that lead to CRC errors while fetching keys from bitcask. This
>         in turn leads to a crash of the vnode everytime such a key is
>         read. So the log is filled with these messages:
>
>         11:55:52.621 [error] CRASH REPORT Process <0.23175.3> with 0
>         neighbours crashed with reason: no case clause matching
>         {error,bad_crc,{state,#Ref<0.0.0.196598>,"262613575457896618114724618378707105094425378816",[{async_folds,true},[{vnode_vclocks,false},{included_applications,[]},{allow_strfun,false},{reduce_js_vm_count,6},{storage_backend,riak_kv_bitcask_backend},{legacy_keylisting,false},{pb_ip,"0.0.0.0"},{hook_js_vm_count,2},{listkeys_backpressure,false},{mapred_name,"mapred"},{stats_urlpath,"stats"},{legacy_stats,true},{js_thread_stack,16},{riak_kv_stat,true},{add_paths,[]},{http_url_encoding,on},{map_js_vm_count,...},...],...],...}}
>         in riak_kv_vnode:prepare_put/3
>
>         Also those keys cannot be (over)written, since a put without
>         last_write_wins set to true does a get first internally.
>         I think the cause of the error should be obvious to anyone
>         familiar with the riak internals. Otherwise I can provide more
>         information.
>
>         Cheers,
>         Nico
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20120419/28fb0a0c/attachment.html>


More information about the riak-users mailing list