crash after single insert

Greg Nelson grourk at dropcam.com
Tue May 10 12:11:31 EDT 2011


Would it work to change /usr/sbin/riak to delete stray .lock files on start?

Sent from my iPhone

On May 10, 2011, at 4:10 AM, Nico Meyer <nico.meyer at adition.com> wrote:

> Hi again!
> 
> I just encountered this problem again myself, so I was able to check my theory.
> So one of the bitcask.write.lock files contained this:
> 
> 2272 /var/lib/riak/bitcask/1121816686466884466511812771987303177196838846464/1305008752.bitcask.data
> 
> and sure enough 'ps axu' gives me:
> 
> riak      2269  0.0  0.0  10624   396 ?        S    12:46   0:00 inet_gethost 4
> riak      2270  0.0  0.0  10624   432 ?        S    12:46   0:00 inet_gethost 4
> riak      2271  0.0  0.0  10624   432 ?        S    12:46   0:00 inet_gethost 4
> riak      2272  0.0  0.0  10624   384 ?        S    12:46   0:00 inet_gethost 4
> root      3139  0.0  0.0      0     0 ?        S    13:00   0:00 [flush-254:1]
> 
> Cheers,
> Nico
> 
> Am 10.05.2011 03:07, schrieb Gary William Flake:
>> (Removing riak-users.)
>> 
>> This was on an Umbuntu 10.04 box.  Riaksearch was auto started in init.d but we occasionally start/stop the service as part of our application stack.  In this one case, we did a shutdown from an admin web console, which may have not called the proper shutdown procedures in init.d.  On restart, I noticed the issues and found the locked files.  Removing them did the trick.
>> 
>> -- GWF
>> 
>> 
>> 
>> 
>> 
>> 
>> On May 9, 2011, at 7:10 AM, David Smith wrote:
>> 
>>> Hmm...ok. Will have to ponder how we can fix that.
>>> 
>>> Thanks!
>>> 
>>> D.
>>> 
>>> On Mon, May 9, 2011 at 8:09 AM, Nico Meyer<nico.meyer at adition.com>  wrote:
>>>> Hi Dave,
>>>> 
>>>> I believe problem occours if there happens to be another process with
>>>> the same PID as the old (now gone) riak node. This can happen if the
>>>> machine was rebooted since the riak node crashed or if the PIDs wrapped,
>>>> they are only two bytes after all.
>>>> os_pid_exists/1 only checks for ANY process with the PID from the
>>>> lockfile
>>>> (https://github.com/basho/bitcask/blob/master/src/bitcask_lockops.erl#L116).
>>>> 
>>>> 
>>>> 
>>>> Am Montag, den 09.05.2011, 07:06 -0600 schrieb David Smith:
>>>>> On Sat, May 7, 2011 at 9:25 AM, Gary William Flake<gary at flake.org>  wrote:
>>>>>> That was it, Nico.  Thanks.
>>>>>> 
>>>>>> I know we did a forced shutdown this week, which was probably the cause.  But I would have thought that riak would have taken care of its own lock file bookkeeping on restarting.
>>>>> Bitcask does:
>>>>> 
>>>>> https://github.com/basho/bitcask/blob/master/src/bitcask_lockops.erl#L46
>>>>> 
>>>>> It's curious that the logic didn't handle the case. What platform/OS
>>>>> are you on? Are you using init scripts to restart on boot?
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> D.
>>>> 
>>>> 
>>> 
>>> 
>>> -- 
>>> Dave Smith
>>> Director, Engineering
>>> Basho Technologies, Inc.
>>> dizzyd at basho.com
>>> 
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com




More information about the riak-users mailing list