Disk Replacement

Replace Disk

Warning

In this LAB environment as we have not physically removed or replaced a disk we will simulate the replacement process with enableDisk, however remember in a real disk replacement scenario, you would choose the replaceDisk option

Instructions

Login as Admin
Click Cluster Tab
Click Nodes Sub-Tab
Click Advanced Tab
Select Command Type: Disk Management
Select Command disableDisk
Choose node5 as the target Node
Type /cloudian1 in Mount Point
Click Execute

Ensure Result shows disableDisk completed.

Instructions

You may need to wait a few minutes for the CMC to register the disabled disk, however errors will start to show within multiple dashboards.

Instructions

Check the node directly from the CMC to confirm the disabled disk.
Click on Cluster Tab
Click on Nodes Sub-Tab
Select the node you disabled the disk on under the Host dropdown
Ensure that the disk is showing as NotAvail

(You may need to refresh screen)

To Activate a Replacement Disk and Restore Data to It

Important

After you’ve physically installed the replacement disk, follow these steps:

Instructions

Log into the CMC
Select Cluster tab
Select Nodes sub-tab
Select Advanced.
For the Command Type select "Disk Management"
For the Command select “enableDisk". (In a true disk failure scenario you would need to use the “replaceDisk” option)
Choose the Target Node (the node on which the disk resides)
Enter the Mount Point of the replacement disk. This must be the same as the mount point of the disk that you disabled (/cloudian1).
Click Execute.

Wait some time for the operation to complete. The disk replacement operation automatically invokes a repair on the mount point (to recreate on the new disk the same set of S3 object data that was on the disk you replaced). The duration of this repair operation will depend on how much data is involved.

Instructions

Select Cluster tab
Select Nodes sub-tab

Info

In the "Disk Detail Info" section, the replacement disk should have a green status icon (indicating that its status is OK): and the operation is done. It may be necessary to click "Clear Error History" to clear the red error indicator.

Instructions

if the disk does not reset to healthy in the CMC, it may be necessary to restart the monitoring agent to speed up the process.
```
systemctl restart cloudian-agent
```

Repair System

Info

When you physically install a new disk and then execute the HyperStore replaceDisk function, the system automatically does the following:
Creates a primary partition and an ext4 file system on the new disk
Establishes appropriate permissions on the mount
Remounts the new disk (using the same mount point that the prior disk had), uncomments its entry in /etc/fstab, and marks the disk as available for HyperStore reads and writes
Moves back to the new disk the same set of storage tokens that were automatically moved away from the prior disk when it was disabled
Performs a data repair for the new disk (populating the new disk with its correct inventory of object replicas and/or erasure coded object fragments)

Warning

Although the HyperStore system has automatically repaired the disk and associated data we should never rely soley on automated processes and must understand how to trigger a repair manually. Whenever we know an issue has occured and been corrected we should consider manually running a repair process especially if adding nodes or performing a rebalance process at the time of failure.

Instructions

We will start with Cassandra, log into your system using the sa_admin account
Start a repaircassandra this will repair all the Cassandra keyspaces.

Remember to modify the command to match your own system

hsstool -h studentXn5 repaircassandra

1. The repair in the labs will only take a few seconds to complete. In a real world scenario it will likely take much longer

Instructions

Next let us repair our erasure coded fragments, if you have logged out, log into your system using the sa_admin account
Start the repair process
```
hsstool -h studentXn5 repairec
```
Again, the repair in the labs will only take a few seconds to complete.

Instructions

Finally let us repair our replicas. If you have logged out, log into your system using the sa_admin account
Start the repair process
```
hsstool -h studentXn5 repair
```
On completion fully log out of the sa_admin session using the exit command
```
exit
```