simulating physical node crash

francisco treacy francisco.treacy at gmail.com
Thu Nov 17 18:10:08 EST 2011


This morning one node went down (3-node 0.14 cluster) and I started getting
the dreaded `no_candidate_nodes,exhausted_prefist` error posted earlier.

If 2 nodes are remaining, and I always use N=3 R=1 ... why is it failing?
Something to do with my use of Search?

Thanks
Francisco


2011/9/28 Martin Woods <mw2134 at gmail.com>

> Hi Francisco
>
> I've seen the same error in a dev environment running on a single Riak
> node with an n_val of 1, so in my case it was nothing to do with a failing
> node. I wasn't running Riak Search either. I posted a question about it to
> this list a week or so ago but haven't seen a reply yet.
>
> So indeed, does anyone know what's causing this error and how we can avoid
> it?
>
> Regards,
> Martin.
>
>
>
> On 28 Sep 2011, at 20:39, francisco treacy < <francisco.treacy at gmail.com>
> francisco.treacy at gmail.com> wrote:
>
> Regarding (3) I found a Forcing Read Repair contrib function (<http://contrib.basho.com/bucket_inspector.html><http://contrib.basho.com/bucket_inspector.html>
> http://contrib.basho.com/bucket_inspector.html) which should help.
>
> Otherwise for the m/r error, all of my buckets use default n_val and write
> quorum. Could it be that some data never reached that particular node in
> the cluster? That is, should've I used W=3?  During the failure, many
> assets were returning 404s which triggered read-repair (and were ok upon
> subsequent request), but no luck with the Map/Reduce function (it kept on
> failing).  Could it have something to do with Riak Search?
>
> Thanks,
>
> Francisco
>
>
> 2011/9/26 francisco treacy < <francisco.treacy at gmail.com><francisco.treacy at gmail.com>
> francisco.treacy at gmail.com>
>
>> Hi all,
>>
>> I have a 3-node Riak cluster, and I am simulating the scenario of
>> physical nodes crashing.
>>
>> When 2 nodes go down, and I query the remaining one, it fails with:
>>
>> {error,
>>     {exit,
>>         {{{error,
>>               {no_candidate_nodes,exhausted_prefist,
>>                   [{riak_kv_mapred_planner,claim_keys,3},
>>                    {riak_kv_map_phase,schedule_input,5},
>>                    {riak_kv_map_phase,handle_input,3},
>>                    {luke_phase,executing,3},
>>                    {gen_fsm,handle_msg,7},
>>                    {proc_lib,init_p_do_apply,3}],
>>                   []}},
>>           {gen_fsm,sync_send_event,
>>               [<0.31566.2330>,
>>                {inputs,
>>
>> (...)
>>
>> Here I'm doing a M/R, inputs being fed by Search.
>>
>> (1) All of the involved buckets have N=3, and all involved requests R=1
>> (I don't really need quorum for this usecase)
>>
>> Why is it failing? I'm sure i'm missing something basic here
>>
>> (2) Probably worth noting, those 3 nodes are spread across *two* physical
>> servers (1 on small one, 2 on beefier one). I've heard it is "not a good
>> idea", not sure why though. These two servers are definitely enough still
>> for our current load; should I consider adding a third one?
>>
>> (3) To overcome the aforementioned error, I added a new node to the
>> cluster (installed on the small server). Now the setup looks like: 4 nodes
>> = 2 on small server, 2 on beefier one.
>>
>> When 2 nodes go down, this works.  Which brings me to another topic...
>> could you point me to good strategies to "pre-" invoke read-repair? Is it
>> up to clients to scan the keyspace forcing reads?  It's a disaster
>> usability-wise when first users start getting 404s all over the place.
>>
>> Francisco
>>
>
> _______________________________________________
> riak-users mailing list
> <riak-users at lists.basho.com>riak-users at lists.basho.com
> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
> _______________________________________________
> riak-users mailing list
> riak-users at lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.basho.com/pipermail/riak-users_lists.basho.com/attachments/20111118/bd49d1d9/attachment.html>


More information about the riak-users mailing list