Dot Slash: Red Hat Cluster Part 3: Fail Over Test Cases

I have re-setup a 2 node cluster on Cent OS 6.5 for the purpose of this blog i.e. testing the cluster. The details of host name and IP address are as below:

 [root@N2 ~]# cat /etc/hosts  
 127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4  
 ::1     localhost localhost.localdomain localhost6 localhost6.localdomain6  
 192.168.200.131     N1 N1.off.com  
 192.168.200.132     N2 N2.off.com  
 10.10.10.1       H1 H1.off.com  
 10.10.10.2       H2 H2.off.com  
 192.168.200.135     F1 FIP.off.com

Checking the status of the cluster.

 [root@N1 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:16:55 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Online, Local, rgmanager  
  N2.off.com                             2 Online, rgmanager  
  /dev/block/8:16                        0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N2.off.com                            started

The cluster service is running on node 2. We will perform some sanity tests and there should not be any disturbances in the cluster operation.

1. Reboot Active Node N2

 [root@N2 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:18:33 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Online, rgmanager  
  N2.off.com                             2 Online, Local, rgmanager  
  /dev/block/8:16                        0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N2.off.com                            started  
 [root@N2 ~]# reboot

While rebooting node 2, the cluster service is moving to node 1 by stopping the service on node 2 as shown below.

 [root@N1 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:19:31 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Online, Local, rgmanager  
  N2.off.com                             2 Online, rgmanager  
  /dev/block/8:16                        0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N2.off.com                            stopping

Now, we can see that the service has started on node 1.

 [root@N1 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:19:48 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Online, Local, rgmanager  
  N2.off.com                             2 Online  
  /dev/block/8:16                        0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N1.off.com                            started

After node 2 comes up, we check the cluster status.

 [root@N2 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:21:21 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Online, rgmanager  
  N2.off.com                             2 Online, Local, rgmanager  
  /dev/block/8:16                        0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N1.off.com                            started

We need to make sure the service does not fail back to node 2 as per cluster configuration. We can see above the service is still active on node 1 and will not move to node 2 which was the last owner.

2. Poweroff Active Node N1

As the service is running on node 1, we power off that node

 [root@N1 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:21:11 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Online, Local, rgmanager  
  N2.off.com                             2 Online, rgmanager  
  /dev/block/8:16                        0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N1.off.com                            started  
 [root@N1 ~]# poweroff  
 Broadcast message from root@N1  
     (/dev/pts/1) at 15:21 ...  
 The system is going down for power off NOW!

We can see that the services are running on node 2 after we power off the first node.

 [root@N2 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:22:55 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Offline  
  N2.off.com                             2 Online, Local, rgmanager  
  /dev/block/8:16                           0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N2.off.com                            started

3. Power off both nodes and power on both nodes at the same time

I have powered off both the nodes as shown below. For this test we will power on both nodes at the same time and check if the cluster services come up automatically.

After we power on both the nodes, the second node has come up and the cluster service is activated on that node.

 [root@N2 ~]# clustat  
 Cluster Status for officetest @ Fri Dec 7 15:33:14 2018  
 Member Status: Quorate  
  Member Name                           ID  Status  
  ------ ----                           ---- ------  
  N1.off.com                             1 Offline  
  N2.off.com                             2 Online, Local, rgmanager  
  /dev/block/8:16                           0 Online, Quorum Disk  
  Service Name                           Owner (Last)                           State  
  ------- ----                           ----- ------                           -----  
  service:Test Service                   N2.off.com                            started

When checking the status of node 1, the node is still in the boot process. The cluster is delaying the power on process of node 1 while the cluster service is activated in the node 2.

4. Move services among nodes

Now, we manually move the cluster service among the nodes. The second node is active right now. So we run the following command to move the given service to node 1. The -r specifies the service name and -m specifies the node name where the services are to be moved.

 [root@N2 ~]# clusvcadm -r 'Test Service' -m N1.off.com  
 Trying to relocate service:Test Service to N1.off.com...Success  
 service:Test Service is now running on N1.off.com

The test is successful and we further move the service to the second node as shown below.

 [root@N1 ~]# clusvcadm -r 'Test Service' -m N2.off.com  
 Trying to relocate service:Test Service to N2.off.com...Success  
 service:Test Service is now running on N2.off.com

5. Test fencing of nodes

We can also test if the fencing configuration is done correctly by fencing the nodes. For manual fencing we run the following command. The fencing operation is set as reboot. On successful fencing of the node, the node is rebooted by the fencing user configured on vCenter.

 [root@N1 ~]# fence_node N2  
 fence N2 success

 [root@N2 ~]# fence_node N1  
 fence N1 success

Friday, January 5, 2018

Red Hat Cluster Part 3: Fail Over Test Cases

1. Reboot Active Node N2

2. Poweroff Active Node N1

3. Power off both nodes and power on both nodes at the same time

4. Move services among nodes

5. Test fencing of nodes

No comments:

Post a Comment