Urgent help with a down node.

John Caprice jcaprice at basho.com
Mon Jul 8 12:01:29 EDT 2013


Bryan,

Anything that interrupts the writing to or closing of that file can cause
this.  This could be i/o errors related to the file system or the disk, a
Riak crash or kill etc.  If you see this happen regularly it would be an
issue worth investigating.

Thanks,

John


On Mon, Jul 8, 2013 at 8:54 AM, Bryan Hughes <bryan at go-factory.net> wrote:

>  Hi John,
>
> Thank you!   Can you give any insight as to what is the cause of the
> problem, or point me to any Basho Documentation detailing this?
>
> Cheers,
> Bryan
>
>
> On 7/8/13 8:22 AM, John Caprice wrote:
>
> Hey Bryan,
>
>  This indicates a problem with the Bitcask data file.  That data file,
> according to the second error report, was truncated.  You more than likely
> did not experience any data loss as this would affect only a single
> replica, and only those contained in that data file.  To be safe, you can
> repair the partition by attaching to Riak and running:
>
>  riak_kv:repair(22835963083295358096932575511191922182123945984).
>
>  after which you can detach from Riak with ctrl-d and monitor the status
> of the repair in riak-admin transfers.  This command will read-repair any
> lost replicas due to the data file truncation.
>
>  Thanks,
>
>  John Caprice
>
>
> On Mon, Jul 8, 2013 at 8:11 AM, Bryan Hughes <bryan at go-factory.net> wrote:
>
>>  Andrew,
>>
>> Thanks for the tip on how to use Google.  :)   But that was not my
>> original question.  I wanted to understand in more detail from the Basho
>> folks what
>>
>> 2013-07-07 12:51:42 =ERROR REPORT====
>> Hintfile
>> './data/bitcask/22835963083295358096932575511191922182123945984/3.bitcask.hint'
>> contains pointer 16555635 566 that is greater than total data size 16556032
>>
>>  and
>>
>>
>> 2013-07-07 12:54:43 =ERROR REPORT====
>> Bad datafile entry, discarding(383/566 bytes)
>>
>>  meant to my system.  For example, did I lose data and if so, how do I
>> know what data was lost?  More importantly is if this is data lost, how did
>> it happen.  I ran fsck on all the disks and checked the health of the
>> system - which is all good.
>>
>> The later information was included for completeness.
>>
>> Bryan
>>
>>
>> On 7/8/13 12:49 AM, Andrew Berman wrote:
>>
>> Bryan,
>>
>>  What version of Erlang?  You should check this out:
>> https://github.com/basho/riak_kv/issues/411
>>
>>  BTW - Google is your friend, which is how I found the above issue :)
>>
>>  --Andrew
>>
>>
>> On Sun, Jul 7, 2013 at 3:01 PM, Bryan Hughes <bryan at go-factory.net>wrote:
>>
>>>  Hi Mark,
>>>
>>> DOH - sorry for the lack of detail.  Didnt have enough coffee this
>>> morning.
>>>
>>> OS:     CentOS release 6.3 (Final)
>>> Riak:   Riak 1.2.1
>>>
>>> Hadnt had a chance to upgrade to 1.3 yet.
>>>
>>> Got the node back up - but not entirely sure why which is a little
>>> concerning.  Been verifying the data, and everything looks intact.  When I
>>> try to run riak-admin status, I get the following (note I am not entirely
>>> sure this was the case when we first set the node up):
>>>
>>> $ riak-admin status
>>> Status failed, see log for details
>>>
>>> The logs shows:
>>>
>>> 2013-07-07 14:55:03.858 [error] <0.12982.0>@riak_kv_console:status:173
>>> Status failed error:function_clause
>>> 2013-07-07 14:55:03.858 [error] emulator Error in process <0.12983.0> on
>>> node 'riak at 127.0.0.1' with exit value:
>>> {badarg,[{erlang,system_info,[global_heaps_size],[]},{riak_kv_stat,system_stats,0,[{file,"src/riak_kv_stat.erl"},{line,421}]},{riak_kv_stat,produce_stats,0,[{file,"src/riak_kv_stat.erl"},{line,320}]},{timer,tc,3,[{file,"timer...
>>>
>>>
>>> This is on a dev cluster with an out-of-the box configuration using
>>> bitcask.
>>>
>>> Thanks!
>>>
>>> Bryan
>>>
>>>
>>> On 7/7/13 2:51 PM, Mark Phillips wrote:
>>>
>>> Hi Bryan,
>>>
>>>  I remember seeing something similar on the list a while ago. I'll dig
>>> through the archives (Riak.markmail.org) if I have a few minutes later
>>> tonight.
>>>
>>>  In the mean time, what version of Riak is this? And what OS?
>>>
>>>  Mark
>>>
>>> On Sunday, July 7, 2013, Bryan Hughes wrote:
>>>
>>>>  Anyone familiar with this error message?
>>>>
>>>> 2013-07-07 12:51:42 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/22835963083295358096932575511191922182123945984/3.bitcask.hint'
>>>> contains pointer 16555635 566 that is greater than total data size 16556032
>>>> 2013-07-07 12:51:45 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/114179815416476790484662877555959610910619729920/3.bitcask.hint'
>>>> contains pointer 17817310 567 that is greater than total data size
>>>> 17817600
>>>> 2013-07-07 12:51:46 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/159851741583067506678528028578343455274867621888/3.bitcask.hint'
>>>> contains pointer 7573448 567 that is greater than total data size
>>>> 7573504
>>>> 2013-07-07 12:51:46 =ERROR REPORT====
>>>> Bad datafile entry 1:
>>>> {ok,<<131,104,2,109,0,0,0,9,65,80,73,67,79,85,78,84,83,109,0,0,0,33,55,56,54,57,52,49,56,49,94,103,111,115,101,114,118,105,99,101,95,99>>}
>>>> 2013-07-07 12:51:56 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/730750818665451459101842416358141509827966271488/3.bitcask.hint'
>>>> contains pointer 13229833 581 that is greater than total data size 13230080
>>>> 2013-07-07 12:52:05 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/1187470080331358621040493926581979953470445191168/3.bitcask.hint'
>>>> contains pointer 23465420 578 that is greater than total data size 23465984
>>>> 2013-07-07 12:52:06 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/1210306043414653979137426502093171875652569137152/3.bitcask.hint'
>>>> contains pointer 27733824 578 that is greater than total data size 27734016
>>>> 2013-07-07 12:52:07 =ERROR REPORT====
>>>> Hintfile
>>>> './data/bitcask/1233142006497949337234359077604363797834693083136/3.bitcask.hint'
>>>> contains pointer 15014008 578 that is greater than total data size
>>>> 15014586
>>>> 2013-07-07 12:54:43 =ERROR REPORT====
>>>> Bad datafile entry, discarding(383/566 bytes)
>>>> 2013-07-07 12:54:45 =ERROR REPORT====
>>>> Bad datafile entry, discarding(276/567 bytes)
>>>> 2013-07-07 12:54:46 =ERROR REPORT====
>>>> Bad datafile entry, discarding(42/567 bytes)
>>>> 2013-07-07 12:54:57 =ERROR REPORT====
>>>> Bad datafile entry, discarding(233/581 bytes)
>>>> 2013-07-07 12:55:06 =ERROR REPORT====
>>>> Bad datafile entry, discarding(550/578 bytes)
>>>> 2013-07-07 12:55:07 =ERROR REPORT====
>>>> Bad datafile entry, discarding(178/578 bytes)
>>>> 2013-07-07 12:56:00 =ERROR REPORT====
>>>> Error in process <0.1536.0> on node 'riak at 127.0.0.1' with exit value:
>>>> {badarg,[{erlang,system_info,[global_heaps_size],[]},{riak_kv_stat,system_stats,0,[{file,"src/riak_kv_stat.erl"},{line,421}]},{riak_kv_stat,produce_stats,0,[{file,"src/riak_kv_stat.erl"},{line,320}]},{timer,tc,3,[{file,"timer...
>>>>
>>>> --
>>>>
>>>> Bryan Hughes
>>>> *Go Factory*
>>>> http://www.go-factory.net
>>>>
>>>> *"Internet Class, Enterprise Grade"*
>>>>
>>>>
>>>>
>>>  --
>>>
>>> Bryan Hughes
>>> CTO and Founder / *Go Factory*
>>> (415) 515-7916
>>>
>>> http://www.go-factory.net
>>>
>>> *"Internet Class, Enterprise Grade"*
>>>
>>>
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>>
>>
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20130708/694ba1fe/attachment.html>


More information about the riak-users mailing list