Help with handling Riak disk failure

Leo scicomplete at gmail.com
Tue Sep 19 18:24:04 EDT 2017


Okay. Please let me know the riak config parameters or other
parameters you think could make the recovery faster. For example, the
transfer-limit which can be changed used the riak-admin transfer-limit
command.

Thanks,
Leo

On Tue, Sep 19, 2017 at 2:23 PM, Bryan Hunt
<bryan.hunt at erlang-solutions.com> wrote:
> Sorry Leo,
>
> That’s completely impossible to guess :-D
>
> Factors include - I/O, Network cards, network switch, selinux, block size, CPU, size of objects, number of objects, CRDT, Riak version, etc…
>
> Best,
>
> Bryan
>
>> On 19 Sep 2017, at 18:53, Leo <scicomplete at gmail.com> wrote:
>>
>> Dear Bryan,
>>
>> Thank you very much for your answers. They are very helpful to me.
>> I will use more nodes (>=5) in future.
>>
>> From your experience with using Riak, what would your guess be for the
>> time taken to finish all the AAE transfers and be done with the
>> recovery for about 1 TB worth of data (assuming my cluster is
>> otherwise completely idle without any user accessing the cluster
>> during this process and that  I am continuously watching the transfers
>> and re-enabling disabled AAE trees gradually )?  I am just asking for
>> rough estimate from your past experience ( please quote from your
>> experience with a difference sized cluster / data size too ). My guess
>> is that it will take approx. 2 days or more. Do you concur?
>>
>> Thanks,
>> Leo
>>
>>
>> On Tue, Sep 19, 2017 at 12:41 PM, Bryan Hunt
>> <bryan.hunt at erlang-solutions.com> wrote:
>>> (0) Three nodes are insufficient, you should have 5 nodes
>>> (1) You could iterate and read every object in the cluster - this would also
>>> trigger read repair for every object
>>> (2) - copied from Engel Sanchez response to a similar question  April 10th
>>> 2014 )
>>>
>>> * If AAE is disabled, you don't have to stop the node to delete the data in
>>> the anti_entropy directories
>>> * If AAE is enabled, deleting the AAE data in a rolling manner may trigger
>>> an avalanche of read repairs between nodes with the bad trees and nodes
>>> with good trees as the data seems to diverge.
>>>
>>> If your nodes are already up, with AAE enabled and with old incorrect trees
>>> in the mix, there is a better way.  You can dynamically disable AAE with
>>> some console commands. At that point, without stopping the nodes, you can
>>> delete all AAE data across the cluster.  At a convenient time, re-enable
>>> AAE.  I say convenient because all trees will start to rebuild, and that
>>> can be problematic in an overloaded cluster.  Doing this over the weekend
>>> might be a good idea unless your cluster can take the extra load.
>>>
>>> To dynamically disable AAE from the Riak console, you can run this command:
>>>
>>>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, disable, [],
>>> 60000).
>>>
>>> and enable with the similar:
>>>
>>>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, [],
>>> 60000).
>>>
>>> That last number is just a timeout for the RPC operation.  I hope this
>>> saves you some extra load on your clusters.
>>>
>>> (3) That’s going to be :
>>> (3a) List all keys using the client of your choice
>>> (3b) Fetch each object
>>>
>>> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/reading-objects/
>>>
>>> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/secondary-indexes/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 19 Sep 2017, at 18:31, Leo <scicomplete at gmail.com> wrote:
>>>
>>> Dear Riak users and experts,
>>>
>>> I really appreciate any help with my questions below.
>>>
>>> I have a 3 node Riak cluster with each having approx. 1 TB disk usage.
>>> All of a sudden, one node's hard disk failed unrecoverably. So, I
>>> added a new node using the following steps:
>>>
>>> 1) riak-admin cluster join 2) down the failed node 3) riak-admin
>>> force-replace failed-node new-node 4) riak-admin cluster plan 5)
>>> riak-admin cluster commit.
>>>
>>> This almost fixed the problem except that after lots of data transfers
>>> and handoffs, now not all three nodes have 1 TB disk usage. Only two
>>> of them have 1 TB disk usage. The other one is almost empty (few 10s
>>> of GBs). This means there are no longer 3 copies on disk anymore. My
>>> data is completely random (no two keys have same data associated with
>>> them. So, compression of data cannot be the reason for less data on
>>> disk),
>>>
>>> I also tried using the "riak-admin cluster replace failednode newnode"
>>> command so that the leaving node handsoff data to the joining node.
>>> This however is not helpful if the leaving node has a failed hard
>>> disk. I want the remaining live vnodes to help the new node recreate
>>> the lost data using their replica copies.
>>>
>>> I have three questions:
>>>
>>> 1) What commands should I run to forcefully make sure there are three
>>> replicas on disk overall without waiting for read-repair or
>>> anti-entropy to make three copies ? Bandwidth usage or CPU usage is
>>> not a huge concern for me.
>>>
>>> 2) Also, I will be very grateful if someone lists the commands that I
>>> can run using "riak attach" so that I can clear the AAE trees and
>>> forcefully make sure all data has 3 copies.
>>>
>>> 3) I will be very thankful if someone helps me with the commands that
>>> I should run to ensure that all data has 3 replicas on disk after the
>>> disk failure (instead of just looking at the disk space usage in all
>>> the nodes as hints)?
>>>
>>> Thanks,
>>> Leo
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> riak-users at lists.basho.com
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>




More information about the riak-users mailing list