Help with handling Riak disk failure

classic Classic list List threaded Threaded
5 messages Options
Leo
Reply | Threaded
Open this post in threaded view
|

Help with handling Riak disk failure

Leo
Dear Riak users and experts,

I really appreciate any help with my questions below.

I have a 3 node Riak cluster with each having approx. 1 TB disk usage.
All of a sudden, one node's hard disk failed unrecoverably. So, I
added a new node using the following steps:

1) riak-admin cluster join 2) down the failed node 3) riak-admin
force-replace failed-node new-node 4) riak-admin cluster plan 5)
riak-admin cluster commit.

This almost fixed the problem except that after lots of data transfers
and handoffs, now not all three nodes have 1 TB disk usage. Only two
of them have 1 TB disk usage. The other one is almost empty (few 10s
of GBs). This means there are no longer 3 copies on disk anymore. My
data is completely random (no two keys have same data associated with
them. So, compression of data cannot be the reason for less data on
disk),

I also tried using the "riak-admin cluster replace failednode newnode"
command so that the leaving node handsoff data to the joining node.
This however is not helpful if the leaving node has a failed hard
disk. I want the remaining live vnodes to help the new node recreate
the lost data using their replica copies.

I have three questions:

1) What commands should I run to forcefully make sure there are three
replicas on disk overall without waiting for read-repair or
anti-entropy to make three copies ? Bandwidth usage or CPU usage is
not a huge concern for me.

2) Also, I will be very grateful if someone lists the commands that I
can run using "riak attach" so that I can clear the AAE trees and
forcefully make sure all data has 3 copies.

3) I will be very thankful if someone helps me with the commands that
I should run to ensure that all data has 3 replicas on disk after the
disk failure (instead of just looking at the disk space usage in all
the nodes as hints)?

Thanks,
Leo

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Help with handling Riak disk failure

Bryan Hunt-3
(0) Three nodes are insufficient, you should have 5 nodes
(1) You could iterate and read every object in the cluster - this would also trigger read repair for every object
(2) - copied from Engel Sanchez response to a similar question  April 10th 2014 )
* If AAE is disabled, you don't have to stop the node to delete the data in
the anti_entropy directories
* If AAE is enabled, deleting the AAE data in a rolling manner may trigger
an avalanche of read repairs between nodes with the bad trees and nodes
with good trees as the data seems to diverge.

If your nodes are already up, with AAE enabled and with old incorrect trees
in the mix, there is a better way.  You can dynamically disable AAE with
some console commands. At that point, without stopping the nodes, you can
delete all AAE data across the cluster.  At a convenient time, re-enable
AAE.  I say convenient because all trees will start to rebuild, and that
can be problematic in an overloaded cluster.  Doing this over the weekend
might be a good idea unless your cluster can take the extra load.

To dynamically disable AAE from the Riak console, you can run this command:

> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, disable, [],
60000).

and enable with the similar:

> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, [],
60000).

That last number is just a timeout for the RPC operation.  I hope this
saves you some extra load on your clusters.
(3) That’s going to be :
(3a) List all keys using the client of your choice
(3b) Fetch each object



 





On 19 Sep 2017, at 18:31, Leo <[hidden email]> wrote:

Dear Riak users and experts,

I really appreciate any help with my questions below.

I have a 3 node Riak cluster with each having approx. 1 TB disk usage.
All of a sudden, one node's hard disk failed unrecoverably. So, I
added a new node using the following steps:

1) riak-admin cluster join 2) down the failed node 3) riak-admin
force-replace failed-node new-node 4) riak-admin cluster plan 5)
riak-admin cluster commit.

This almost fixed the problem except that after lots of data transfers
and handoffs, now not all three nodes have 1 TB disk usage. Only two
of them have 1 TB disk usage. The other one is almost empty (few 10s
of GBs). This means there are no longer 3 copies on disk anymore. My
data is completely random (no two keys have same data associated with
them. So, compression of data cannot be the reason for less data on
disk),

I also tried using the "riak-admin cluster replace failednode newnode"
command so that the leaving node handsoff data to the joining node.
This however is not helpful if the leaving node has a failed hard
disk. I want the remaining live vnodes to help the new node recreate
the lost data using their replica copies.

I have three questions:

1) What commands should I run to forcefully make sure there are three
replicas on disk overall without waiting for read-repair or
anti-entropy to make three copies ? Bandwidth usage or CPU usage is
not a huge concern for me.

2) Also, I will be very grateful if someone lists the commands that I
can run using "riak attach" so that I can clear the AAE trees and
forcefully make sure all data has 3 copies.

3) I will be very thankful if someone helps me with the commands that
I should run to ensure that all data has 3 replicas on disk after the
disk failure (instead of just looking at the disk space usage in all
the nodes as hints)?

Thanks,
Leo

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Leo
Reply | Threaded
Open this post in threaded view
|

Re: Help with handling Riak disk failure

Leo
Dear Bryan,

Thank you very much for your answers. They are very helpful to me.
I will use more nodes (>=5) in future.

From your experience with using Riak, what would your guess be for the
time taken to finish all the AAE transfers and be done with the
recovery for about 1 TB worth of data (assuming my cluster is
otherwise completely idle without any user accessing the cluster
during this process and that  I am continuously watching the transfers
and re-enabling disabled AAE trees gradually )?  I am just asking for
rough estimate from your past experience ( please quote from your
experience with a difference sized cluster / data size too ). My guess
is that it will take approx. 2 days or more. Do you concur?

Thanks,
Leo


On Tue, Sep 19, 2017 at 12:41 PM, Bryan Hunt
<[hidden email]> wrote:

> (0) Three nodes are insufficient, you should have 5 nodes
> (1) You could iterate and read every object in the cluster - this would also
> trigger read repair for every object
> (2) - copied from Engel Sanchez response to a similar question  April 10th
> 2014 )
>
> * If AAE is disabled, you don't have to stop the node to delete the data in
> the anti_entropy directories
> * If AAE is enabled, deleting the AAE data in a rolling manner may trigger
> an avalanche of read repairs between nodes with the bad trees and nodes
> with good trees as the data seems to diverge.
>
> If your nodes are already up, with AAE enabled and with old incorrect trees
> in the mix, there is a better way.  You can dynamically disable AAE with
> some console commands. At that point, without stopping the nodes, you can
> delete all AAE data across the cluster.  At a convenient time, re-enable
> AAE.  I say convenient because all trees will start to rebuild, and that
> can be problematic in an overloaded cluster.  Doing this over the weekend
> might be a good idea unless your cluster can take the extra load.
>
> To dynamically disable AAE from the Riak console, you can run this command:
>
>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, disable, [],
> 60000).
>
> and enable with the similar:
>
>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, [],
> 60000).
>
> That last number is just a timeout for the RPC operation.  I hope this
> saves you some extra load on your clusters.
>
> (3) That’s going to be :
> (3a) List all keys using the client of your choice
> (3b) Fetch each object
>
> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/reading-objects/
>
> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/secondary-indexes/
>
>
>
>
>
>
>
> On 19 Sep 2017, at 18:31, Leo <[hidden email]> wrote:
>
> Dear Riak users and experts,
>
> I really appreciate any help with my questions below.
>
> I have a 3 node Riak cluster with each having approx. 1 TB disk usage.
> All of a sudden, one node's hard disk failed unrecoverably. So, I
> added a new node using the following steps:
>
> 1) riak-admin cluster join 2) down the failed node 3) riak-admin
> force-replace failed-node new-node 4) riak-admin cluster plan 5)
> riak-admin cluster commit.
>
> This almost fixed the problem except that after lots of data transfers
> and handoffs, now not all three nodes have 1 TB disk usage. Only two
> of them have 1 TB disk usage. The other one is almost empty (few 10s
> of GBs). This means there are no longer 3 copies on disk anymore. My
> data is completely random (no two keys have same data associated with
> them. So, compression of data cannot be the reason for less data on
> disk),
>
> I also tried using the "riak-admin cluster replace failednode newnode"
> command so that the leaving node handsoff data to the joining node.
> This however is not helpful if the leaving node has a failed hard
> disk. I want the remaining live vnodes to help the new node recreate
> the lost data using their replica copies.
>
> I have three questions:
>
> 1) What commands should I run to forcefully make sure there are three
> replicas on disk overall without waiting for read-repair or
> anti-entropy to make three copies ? Bandwidth usage or CPU usage is
> not a huge concern for me.
>
> 2) Also, I will be very grateful if someone lists the commands that I
> can run using "riak attach" so that I can clear the AAE trees and
> forcefully make sure all data has 3 copies.
>
> 3) I will be very thankful if someone helps me with the commands that
> I should run to ensure that all data has 3 replicas on disk after the
> disk failure (instead of just looking at the disk space usage in all
> the nodes as hints)?
>
> Thanks,
> Leo
>
> _______________________________________________
> riak-users mailing list
> [hidden email]
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Reply | Threaded
Open this post in threaded view
|

Re: Help with handling Riak disk failure

Bryan Hunt-3
Sorry Leo,

That’s completely impossible to guess :-D

Factors include - I/O, Network cards, network switch, selinux, block size, CPU, size of objects, number of objects, CRDT, Riak version, etc…

Best,

Bryan

> On 19 Sep 2017, at 18:53, Leo <[hidden email]> wrote:
>
> Dear Bryan,
>
> Thank you very much for your answers. They are very helpful to me.
> I will use more nodes (>=5) in future.
>
> From your experience with using Riak, what would your guess be for the
> time taken to finish all the AAE transfers and be done with the
> recovery for about 1 TB worth of data (assuming my cluster is
> otherwise completely idle without any user accessing the cluster
> during this process and that  I am continuously watching the transfers
> and re-enabling disabled AAE trees gradually )?  I am just asking for
> rough estimate from your past experience ( please quote from your
> experience with a difference sized cluster / data size too ). My guess
> is that it will take approx. 2 days or more. Do you concur?
>
> Thanks,
> Leo
>
>
> On Tue, Sep 19, 2017 at 12:41 PM, Bryan Hunt
> <[hidden email]> wrote:
>> (0) Three nodes are insufficient, you should have 5 nodes
>> (1) You could iterate and read every object in the cluster - this would also
>> trigger read repair for every object
>> (2) - copied from Engel Sanchez response to a similar question  April 10th
>> 2014 )
>>
>> * If AAE is disabled, you don't have to stop the node to delete the data in
>> the anti_entropy directories
>> * If AAE is enabled, deleting the AAE data in a rolling manner may trigger
>> an avalanche of read repairs between nodes with the bad trees and nodes
>> with good trees as the data seems to diverge.
>>
>> If your nodes are already up, with AAE enabled and with old incorrect trees
>> in the mix, there is a better way.  You can dynamically disable AAE with
>> some console commands. At that point, without stopping the nodes, you can
>> delete all AAE data across the cluster.  At a convenient time, re-enable
>> AAE.  I say convenient because all trees will start to rebuild, and that
>> can be problematic in an overloaded cluster.  Doing this over the weekend
>> might be a good idea unless your cluster can take the extra load.
>>
>> To dynamically disable AAE from the Riak console, you can run this command:
>>
>>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, disable, [],
>> 60000).
>>
>> and enable with the similar:
>>
>>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, [],
>> 60000).
>>
>> That last number is just a timeout for the RPC operation.  I hope this
>> saves you some extra load on your clusters.
>>
>> (3) That’s going to be :
>> (3a) List all keys using the client of your choice
>> (3b) Fetch each object
>>
>> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/reading-objects/
>>
>> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/secondary-indexes/
>>
>>
>>
>>
>>
>>
>>
>> On 19 Sep 2017, at 18:31, Leo <[hidden email]> wrote:
>>
>> Dear Riak users and experts,
>>
>> I really appreciate any help with my questions below.
>>
>> I have a 3 node Riak cluster with each having approx. 1 TB disk usage.
>> All of a sudden, one node's hard disk failed unrecoverably. So, I
>> added a new node using the following steps:
>>
>> 1) riak-admin cluster join 2) down the failed node 3) riak-admin
>> force-replace failed-node new-node 4) riak-admin cluster plan 5)
>> riak-admin cluster commit.
>>
>> This almost fixed the problem except that after lots of data transfers
>> and handoffs, now not all three nodes have 1 TB disk usage. Only two
>> of them have 1 TB disk usage. The other one is almost empty (few 10s
>> of GBs). This means there are no longer 3 copies on disk anymore. My
>> data is completely random (no two keys have same data associated with
>> them. So, compression of data cannot be the reason for less data on
>> disk),
>>
>> I also tried using the "riak-admin cluster replace failednode newnode"
>> command so that the leaving node handsoff data to the joining node.
>> This however is not helpful if the leaving node has a failed hard
>> disk. I want the remaining live vnodes to help the new node recreate
>> the lost data using their replica copies.
>>
>> I have three questions:
>>
>> 1) What commands should I run to forcefully make sure there are three
>> replicas on disk overall without waiting for read-repair or
>> anti-entropy to make three copies ? Bandwidth usage or CPU usage is
>> not a huge concern for me.
>>
>> 2) Also, I will be very grateful if someone lists the commands that I
>> can run using "riak attach" so that I can clear the AAE trees and
>> forcefully make sure all data has 3 copies.
>>
>> 3) I will be very thankful if someone helps me with the commands that
>> I should run to ensure that all data has 3 replicas on disk after the
>> disk failure (instead of just looking at the disk space usage in all
>> the nodes as hints)?
>>
>> Thanks,
>> Leo
>>
>> _______________________________________________
>> riak-users mailing list
>> [hidden email]
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>>


_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Leo
Reply | Threaded
Open this post in threaded view
|

Re: Help with handling Riak disk failure

Leo
Okay. Please let me know the riak config parameters or other
parameters you think could make the recovery faster. For example, the
transfer-limit which can be changed used the riak-admin transfer-limit
command.

Thanks,
Leo

On Tue, Sep 19, 2017 at 2:23 PM, Bryan Hunt
<[hidden email]> wrote:

> Sorry Leo,
>
> That’s completely impossible to guess :-D
>
> Factors include - I/O, Network cards, network switch, selinux, block size, CPU, size of objects, number of objects, CRDT, Riak version, etc…
>
> Best,
>
> Bryan
>
>> On 19 Sep 2017, at 18:53, Leo <[hidden email]> wrote:
>>
>> Dear Bryan,
>>
>> Thank you very much for your answers. They are very helpful to me.
>> I will use more nodes (>=5) in future.
>>
>> From your experience with using Riak, what would your guess be for the
>> time taken to finish all the AAE transfers and be done with the
>> recovery for about 1 TB worth of data (assuming my cluster is
>> otherwise completely idle without any user accessing the cluster
>> during this process and that  I am continuously watching the transfers
>> and re-enabling disabled AAE trees gradually )?  I am just asking for
>> rough estimate from your past experience ( please quote from your
>> experience with a difference sized cluster / data size too ). My guess
>> is that it will take approx. 2 days or more. Do you concur?
>>
>> Thanks,
>> Leo
>>
>>
>> On Tue, Sep 19, 2017 at 12:41 PM, Bryan Hunt
>> <[hidden email]> wrote:
>>> (0) Three nodes are insufficient, you should have 5 nodes
>>> (1) You could iterate and read every object in the cluster - this would also
>>> trigger read repair for every object
>>> (2) - copied from Engel Sanchez response to a similar question  April 10th
>>> 2014 )
>>>
>>> * If AAE is disabled, you don't have to stop the node to delete the data in
>>> the anti_entropy directories
>>> * If AAE is enabled, deleting the AAE data in a rolling manner may trigger
>>> an avalanche of read repairs between nodes with the bad trees and nodes
>>> with good trees as the data seems to diverge.
>>>
>>> If your nodes are already up, with AAE enabled and with old incorrect trees
>>> in the mix, there is a better way.  You can dynamically disable AAE with
>>> some console commands. At that point, without stopping the nodes, you can
>>> delete all AAE data across the cluster.  At a convenient time, re-enable
>>> AAE.  I say convenient because all trees will start to rebuild, and that
>>> can be problematic in an overloaded cluster.  Doing this over the weekend
>>> might be a good idea unless your cluster can take the extra load.
>>>
>>> To dynamically disable AAE from the Riak console, you can run this command:
>>>
>>>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, disable, [],
>>> 60000).
>>>
>>> and enable with the similar:
>>>
>>>> riak_core_util:rpc_every_member_ann(riak_kv_entropy_manager, enable, [],
>>> 60000).
>>>
>>> That last number is just a timeout for the RPC operation.  I hope this
>>> saves you some extra load on your clusters.
>>>
>>> (3) That’s going to be :
>>> (3a) List all keys using the client of your choice
>>> (3b) Fetch each object
>>>
>>> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/reading-objects/
>>>
>>> https://www.tiot.jp/riak-docs/riak/kv/2.2.3/developing/usage/secondary-indexes/
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 19 Sep 2017, at 18:31, Leo <[hidden email]> wrote:
>>>
>>> Dear Riak users and experts,
>>>
>>> I really appreciate any help with my questions below.
>>>
>>> I have a 3 node Riak cluster with each having approx. 1 TB disk usage.
>>> All of a sudden, one node's hard disk failed unrecoverably. So, I
>>> added a new node using the following steps:
>>>
>>> 1) riak-admin cluster join 2) down the failed node 3) riak-admin
>>> force-replace failed-node new-node 4) riak-admin cluster plan 5)
>>> riak-admin cluster commit.
>>>
>>> This almost fixed the problem except that after lots of data transfers
>>> and handoffs, now not all three nodes have 1 TB disk usage. Only two
>>> of them have 1 TB disk usage. The other one is almost empty (few 10s
>>> of GBs). This means there are no longer 3 copies on disk anymore. My
>>> data is completely random (no two keys have same data associated with
>>> them. So, compression of data cannot be the reason for less data on
>>> disk),
>>>
>>> I also tried using the "riak-admin cluster replace failednode newnode"
>>> command so that the leaving node handsoff data to the joining node.
>>> This however is not helpful if the leaving node has a failed hard
>>> disk. I want the remaining live vnodes to help the new node recreate
>>> the lost data using their replica copies.
>>>
>>> I have three questions:
>>>
>>> 1) What commands should I run to forcefully make sure there are three
>>> replicas on disk overall without waiting for read-repair or
>>> anti-entropy to make three copies ? Bandwidth usage or CPU usage is
>>> not a huge concern for me.
>>>
>>> 2) Also, I will be very grateful if someone lists the commands that I
>>> can run using "riak attach" so that I can clear the AAE trees and
>>> forcefully make sure all data has 3 copies.
>>>
>>> 3) I will be very thankful if someone helps me with the commands that
>>> I should run to ensure that all data has 3 replicas on disk after the
>>> disk failure (instead of just looking at the disk space usage in all
>>> the nodes as hints)?
>>>
>>> Thanks,
>>> Leo
>>>
>>> _______________________________________________
>>> riak-users mailing list
>>> [hidden email]
>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>>
>>>
>

_______________________________________________
riak-users mailing list
[hidden email]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com