Disk Replacement

Replace Disk

Warning

In this LAB environment as we have not physically removed or replaced a disk we will simulate with enableDisk, however remember in a real disk replacement scenario, you would choose replaceDisk

Instructions

Login as Admin
Click Cluster Tab
Click Nodes Sub-Tab
Click Advanced Tab
Select Disk Management
Select disableDisk
Choose a target Node
Type /cloudian1 in Mount Point
Click Execute

Ensure Result shows disableDisk completed.

Instructions

You may need to wait a few minutes for the CMC to register the disabled disk.
Click on Cluster Tab
Click on Nodes Sub-Tab
Select the node you disabled the disk on under the Host dropdown
Ensure that the disk is showing as NotAvail

(You may need to refresh screen)

To Activate a Replacement Disk and Restore Data to It

Important

After you’ve physically installed the replacement disk, follow these steps:

Instructions

Log into the CMC
Select Cluster tab
Select Nodes sub-tab
Select Advanced.
For the Command Type select "Disk Management"
For the Command select “enableDisk". (In a true disk failure scenario you would need to use the “replaceDisk” option)
Choose the Target Node (the node on which the disk resides)
Enter the Mount Point of the replacement disk. This must be the same as the mount point of the disk that you disabled (/cloudian1).
Click Execute.

Wait some time for the operation to complete. The disk replacement operation automatically invokes a repair on the mount point (to recreate on the new disk the same set of S3 object data that was on the disk you replaced). The duration of this repair operation will depend on how much data is involved.

Instructions

Select Cluster tab
Select Nodes sub-tab

Info

In the "Disk Detail Info" section, the replacement disk should have a green status icon (indicating that its status is OK): and the operation is done. In the Lab it may be necessary to click "Clear Error History" to clear the red error indicator.

Repair System

Info

In line with best practices we must now perform a manual repair of the system to ensure that any reduction in redundancy is repaired as quickly as possible. As we know that studentXn5 had an issue we will concentrate on this node (modify the command to match your own lab, where X is your lab ID).

Instructions

We will start with Cassandra, log into your system using the sa_admin account

Start a Cassandra repair of all keyspaces

hsstool -h studentXn5 repair allkeyspaces

The repair in the labs will only take a few seconds to complete. In a real world scenario it will likely take much longer

Instructions

Next let us repair our erasure coded fragments, if you have logged out, log into your system using the sa_admin account
Start the repair process
```
hsstool -h studentXn5 repairec
```
Again, the repair in the labs will only take a few seconds to complete.

Instructions

Finally let us repair our replicas, if you have logged out, log into your system using the sa_admin account
Start the repair process
```
hsstool -h studentXn5 repair
```
On completion fully log out of the sa_admin session using the exit command
```
exit
```