Riak crash on node restarts

Jeremy Raymond jeraymond at gmail.com
Fri Nov 18 06:55:53 EST 2011


Something else I tried to give the cluster more time to settle was to wait
until riak-admin transfers reported no pending transfers between updating
nodes. I've had cases where the transfers didn't complete within at least a
couple of hours of waiting. What would be typical amount of time for
pending transfers to complete?

--
Jeremy


On Fri, Nov 18, 2011 at 6:48 AM, Jeremy Raymond <jeraymond at gmail.com> wrote:

> Hello,
>
> I'll setup my deploy script to capture this information and send you the
> info off-list (probably sometime next week).
>
> --
> Jeremy
>
> On 2011-11-15, at 1:16 PM, Jon Meredith wrote:
>
> Hi Joel,
>
> That's not a message I'd expect to see on a clean restart.  We'll need
> some more information to diagnose it.  Next time it crashes, could you
> provide the contents of your ring file (you can just grab the most recent
> one out of /var/lib/riak/ring - location may vary depending on your
> platform) and it would be very helpful if you could modify your deploy
> script to capture the file list for the leveldb directory on *all* of your
> nodes immediately before you bounce riak to do the update.   When it
> crashes, the console.log from all the nodes would also be useful.  If any
> of those files contain sensitive information, please contact me off list.
>
> BR, Jon
>
> On Tue, Nov 15, 2011 at 6:48 AM, Jeremy Raymond <jeraymond at gmail.com>wrote:
>
>> I'm using Riak 1.0.1 and I have a script that deploys updates to each of
>> my 3 nodes to update the Erlang mapred modules. What I do is stop a node,
>> deploy the new mapred modues, restart the node, wait for the riak_kv
>> service to start, then move onto the next node. Sometimes when I do this
>> one of the nodes that is not the current one being updated will go down.
>> Each time this has happened thus far it's been the same node that will go
>> down (the last one). I see this error in the logs:
>>
>> [error] Failed to start riak_kv_eleveldb_backend Reason: {db_open,"IO
>> error:
>> /var/lib/riak/leveldb/913438523331814323877303020447676887284957839360/MANIFEST-000002:
>> No such file or directory"}
>>
>> If I manually restart the node, things go back to normal. Any ideas on
>> what's going on? I've attached the error log.
>>
>> --
>>  Jeremy
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users at lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>
>
>
> --
> Jon Meredith
> Platform Engineering Manager
> Basho Technologies, Inc.
> jmeredith at basho.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/b31c1715/attachment.html>


More information about the riak-users mailing list