So, we are going to setup a High Availability Active/Passive Cluster running NFS server. Although we are setting this environment in VMware vCenter, this is not a VMware cluster, rather an OS level cluster. We can see Node-N1 and Node-N2 virtual machines in VMware in the below image. I have setup RDM (Raw Device Mapping) as shared storage on both the nodes provided from FreeNAS.
We will be using CentOS 7.2 for the cluster test.
[root@node02 ~]# cat /etc/redhat-release
CentOS Linux release 7.2.1511 (Core)
The IP and Hostname details of the nodes:
[root@node01 ~]# cat /etc/hosts
127.0.0.1 cluster-node1.domain.com localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.200.124 node01 node01.test.com
192.168.200.125 node02 node02.test.com
[root@node02 ~]# cat /etc/hosts
127.0.0.1 cluster-node2.domain.com localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.200.124 node01 node01.test.com
192.168.200.125 node02 node02.test.com
Install Pacemaker Configuration Tools
We will install the required packages on both the nodes, however, we will only install the vmware-soap fencing agent as the nodes are Virtual Machines in a VMware environment.
[root@node01 ~]# yum -y install pcs pacemaker fence-agents-vmware-soap
[root@node02 ~]# yum -y install pcs pacemaker fence-agents-vmware-soap
Add Exception rule in Firewall
[root@node01 ~]# firewall-cmd --permanent --add-service=high-availability
success
[root@node01 ~]# firewall-cmd --add-service=high-availability
success
[root@node02 ~]# firewall-cmd --permanent --add-service=high-availability
success
[root@node02 ~]# firewall-cmd --add-service=high-availability
success
Setup password for hacluster user
[root@node01 ~]# passwd hacluster
Changing password for user hacluster.
New password:
[root@node02 ~]# passwd hacluster
Changing password for user hacluster.
New password:
Start the service
[root@node01 ~]# systemctl start pcsd.service
[root@node01 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
[root@node02 ~]# systemctl start pcsd.service
[root@node02 ~]# systemctl enable pcsd.service
Created symlink from /etc/systemd/system/multi-user.target.wants/pcsd.service to /usr/lib/systemd/system/pcsd.service.
Authorize the nodes in the cluster
[root@node01 ~]# pcs cluster auth node01.test.com node02.test.com
Username: hacluster
Password:
node02.test.com: Authorized
node01.test.com: Authorized
Create Cluster
[root@node01 ~]# pcs cluster setup --start --name cluster_pr node01.test.com node02.test.com
Destroying cluster on nodes: node01.test.com, node02.test.com...
node01.test.com: Stopping Cluster (pacemaker)...
node02.test.com: Stopping Cluster (pacemaker)...
node01.test.com: Successfully destroyed cluster
node02.test.com: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'node01.test.com', 'node02.test.com'
node01.test.com: successful distribution of the file 'pacemaker_remote authkey'
node02.test.com: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node01.test.com: Succeeded
node02.test.com: Succeeded
Starting cluster on nodes: node01.test.com, node02.test.com...
node01.test.com: Starting Cluster...
node02.test.com: Starting Cluster...
Synchronizing pcsd certificates on nodes node01.test.com, node02.test.com...
node02.test.com: Success
node01.test.com: Success
Restarting pcsd on the nodes in order to reload the certificates...
node02.test.com: Success
node01.test.com: Success
Enable Cluster
[root@node01 ~]# pcs cluster enable --all
node01.test.com: Cluster Enabled
node02.test.com: Cluster Enabled
We can see below that the cluster is enabled and running.
[root@node01 ~]# pcs cluster status
Cluster Status:
Stack: corosync
Current DC: node01.test.com (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Thu May 10 02:58:50 2018
Last change: Thu May 10 02:55:42 2018 by hacluster via crmd on node01.test.com
2 nodes configured
0 resources configured
PCSD Status:
node01.test.com: Online
node02.test.com: Online
Fencing Agent Configuration
For fencing, we are going to use VMware fencing agent, so we need to create a vCenter user with privileges to Power on and off the VMs.
Test the Fencing Agent
[root@node01 ~]# fence_vmware_soap -z --ssl-insecure -a 192.168.200.212 -l 'fenceuser@vsphere.local' -p '1CLUP@ss3rd' -o list | grep Node
Node-N1,421fd583-88b7-24ce-401d-5f59222d07ab
Node-N2,421fd09e-754b-bc16-aa46-5398b59261a9
Configure Fencing Agent
[root@node01 ~]# pcs stonith create vm_fence fence_vmware_soap ipaddr=192.168.200.212 ipport=443 ssl=1 ssl_insecure=1 login='fenceuser@vsphere.local' passwd='1CLUP@ss3rd' pcmk_off_action=reboot pcmk_host_map="node01.test.com:Node-N1;node02.test.com:Node-N2" delay=10
[root@node01 ~]# echo $?
0
We can see that the fencing agent has been created successfully and is running.
[root@node01 ~]# pcs status
Cluster name: cluster_pr
Stack: corosync
Current DC: node01.test.com (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Thu May 10 04:18:14 2018
Last change: Thu May 10 04:16:44 2018 by root via cibadmin on node01.test.com
2 nodes configured
1 resource configured
Online: [ node01.test.com node02.test.com ]
Full list of resources:
vm_fence (stonith:fence_vmware_soap): Started node01.test.com
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Ignore Quorum
We ignore quorum configuration as it is not supported in 7.2. [root@node01 ~]# pcs property set no-quorum-policy=ignore
Configure LVM on shared storage
We can see in the below images, the shared storage (sdb) is present on both the nodes.
[root@node01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 19G 0 part
├─centos-root 253:0 0 17G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 90G 0 disk
sr0 11:0 1 1024M 0 rom
[root@node02 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 19G 0 part
├─centos-root 253:0 0 17G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 90G 0 disk
sr0 11:0 1 1024M 0 rom
The following steps are to be performed on one node only. First create Physical Volume.
[root@node01 ~]# pvcreate /dev/sdb
Physical volume "/dev/sdb" successfully created.
Then Volume Group named "vg_cluster".
[root@node01 ~]# vgcreate vg_cluster /dev/sdb
Volume group "vg_cluster" successfully created
Finally Logical Volume named "lv_cluster".
[root@node01 ~]# lvcreate -l 23039 -n lv_cluster vg_cluster
Logical volume "lv_cluster" created.
[root@node01 ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
root centos -wi-ao---- <17.00g
swap centos -wi-ao---- 2.00g
lv_cluster vg_cluster -wi-a----- <90.00g
Now, formatting the Logical Volume to 'xfs'.
[root@node01 ~]# mkfs.xfs /dev/vg_cluster/lv_cluster
[root@node01 ~]# blkid |grep cluster
/dev/mapper/vg_cluster-lv_cluster: UUID="886d99c3-a1d1-46f0-bfa7-00c81677e5c1" TYPE="xfs"
[root@node01 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
fd0 2:0 1 4K 0 disk
sda 8:0 0 20G 0 disk
├─sda1 8:1 0 1G 0 part /boot
└─sda2 8:2 0 19G 0 part
├─centos-root 253:0 0 17G 0 lvm /
└─centos-swap 253:1 0 2G 0 lvm [SWAP]
sdb 8:16 0 90G 0 disk
└─vg_cluster-lv_cluster 253:2 0 90G 0 lvm
sr0 11:0 1 1024M 0 rom
We can see above that the LVM partition has been created on sdb.
Create the NFS Share
We will create the Network File System Share that is highly available which is the main objective of this cluster.
[root@node01 ~]# mkdir /Nfs
[root@node01 ~]# mount /dev/vg_cluster/lv_cluster /Nfs/
[root@node01 ~]# mkdir /Nfs/BACKUPDIR{1..5}
[root@node01 ~]# ls /Nfs/
BACKUPDIR1 BACKUPDIR2 BACKUPDIR3 BACKUPDIR4 BACKUPDIR5
[root@node01 ~]# touch /Nfs/BACKUPDIR1/hello_world{1..5}
[root@node01 ~]# ls /Nfs/BACKUPDIR1/
hello_world1 hello_world2 hello_world3 hello_world4 hello_world5
[root@node01 ~]# umount /Nfs
[root@node01 ~]# vgchange -an vg_cluster
0 logical volume(s) in volume group "vg_cluster" now active
Activate Volume Group in a Cluster
The following steps need to be performed on both the nodes. We need to make some changes in the LVM configuration file so that the OS does not activate the LVM partition we created above as it will be managed by the Cluster services.
[root@node01 ~]# lvmconf --enable-halvm --services --startstopservices
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
lvm2-lvmetad.socket
Removed symlink /etc/systemd/system/sysinit.target.wants/lvm2-lvmetad.socket.
[root@node02 ~]# lvmconf --enable-halvm --services --startstopservices
Warning: Stopping lvm2-lvmetad.service, but it can still be activated by:
lvm2-lvmetad.socket
Removed symlink /etc/systemd/system/sysinit.target.wants/lvm2-lvmetad.socket.
We list all the Volume Groups present in the OS.
[root@node01 ~]# vgs --noheadings -o vg_name
centos
vg_cluster
[root@node02 ~]# vgs --noheadings -o vg_name
centos
vg_cluster
Then, we edit the LVM configuration file as shown below. We have added the "centos" Volume Group and not "vg_cluster". The "centos" Volume Group will be activated and managed by the OS.
[root@node01 ~]# vim /etc/lvm/lvm.conf
[root@node02 ~]# vim /etc/lvm/lvm.conf
[root@node01 ~]# vim /etc/lvm/lvm.conf
[root@node01 ~]# cat /etc/lvm/lvm.conf |grep volume_list
# it is auto-activated. The auto_activation_volume_list setting
# Configuration option activation/volume_list.
# or VG. See tags/hosttags. If any host tags exist but volume_list
# volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
volume_list = [ "centos" ]
[root@node02 ~]# vim /etc/lvm/lvm.conf
[root@node02 ~]# cat /etc/lvm/lvm.conf |grep volume_list
# it is auto-activated. The auto_activation_volume_list setting
# Configuration option activation/volume_list.
# or VG. See tags/hosttags. If any host tags exist but volume_list
# volume_list = [ "vg1", "vg2/lvol1", "@tag1", "@*" ]
volume_list = [ "centos" ]
Take a backup of the "initramfs-$(uname -r).img" file.
[root@node01 ~]# cp -p /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@node02 ~]# cp -p /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
Now, we are going to rebuild the initramfs boot image for the changes to be permanent.
[root@node01 ~]# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
[root@node02 ~]# dracut -H -f /boot/initramfs-$(uname -r).img $(uname -r)
And finally we need to reboot both the nodes.
[root@node01 ~]# reboot
[root@node02 ~]# reboot
After rebooting we need to make sure that the cluster services are running and the status is good.
[root@node01 ~]# pcs status
Cluster name: cluster_pr
Stack: corosync
Current DC: node01.test.com (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Thu May 10 05:29:30 2018
Last change: Thu May 10 05:27:12 2018 by hacluster via crmd on node01.test.com
2 nodes configured
1 resource configured
Online: [ node01.test.com node02.test.com ]
Full list of resources:
vm_fence (stonith:fence_vmware_soap): Started node01.test.com
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
Create Resource
We need to create resources for services to be run and be managed by the cluster. First we create the LVM resource provided by 'OCF' (Open Cluster Framework).
[root@node01 ~]# pcs resource create lvm_res LVM volgrpname=vg_cluster exclusive=true --group group_Nfs
Assumed agent name 'ocf:heartbeat:LVM' (deduced from 'LVM')
[root@node01 ~]# echo $?
0
Second we create the Filesystem resource.
[root@node01 ~]# pcs resource create nfsshare Filesystem device=/dev/vg_cluster/lv_cluster directory=/Nfs fstype=xfs --group group_Nfs
Assumed agent name 'ocf:heartbeat:Filesystem' (deduced from 'Filesystem')
[root@node01 ~]# echo $?
0
Third we create the nfs server and nfs-export resource
[root@node01 ~]# pcs resource create nfs-daemon nfsserver nfs_shared_infodir=/Nfs/nfsinfo nfs_no_notify=true --group group_Nfs
Assumed agent name 'ocf:heartbeat:nfsserver' (deduced from 'nfsserver')
[root@node01 ~]# echo $?
0
[root@node01 ~]# pcs resource create nfs-root exportfs clientspec=192.168.200.0/255.255.255.0 options=rw,sync,no_root_squash directory=/Nfs/BACKUPDIR1 fsid=0 --group group_Nfs
Assumed agent name 'ocf:heartbeat:exportfs' (deduced from 'exportfs')
We also need the VIP (Virtual IP) for the shared resources.
[root@node01 ~]# pcs resource create nfs_ip IPaddr2 ip=192.168.200.123 cidr_netmask=24 --group group_Nfs
Assumed agent name 'ocf:heartbeat:IPaddr2' (deduced from 'IPaddr2')
Finally, the nfs-notity resource is created.
[root@node01 ~]# pcs resource create nfs-notify nfsnotify source_host=192.168.200.123 --group group_Nfs
Assumed agent name 'ocf:heartbeat:nfsnotify' (deduced from 'nfsnotify')
The resources have been created and are running.[root@node02 ~]# pcs status
Cluster name: cluster_pr
Stack: corosync
Current DC: node01.test.com (version 1.1.16-12.el7_4.8-94ff4df) - partition with quorum
Last updated: Thu May 10 12:54:42 2018
Last change: Thu May 10 12:50:14 2018 by root via cibadmin on node01.test.com
2 nodes configured
7 resources configured
Online: [ node01.test.com node02.test.com ]
Full list of resources:
vm_fence (stonith:fence_vmware_soap): Started node01.test.com
Resource Group: group_Nfs
lvm_res (ocf::heartbeat:LVM): Started node02.test.com
nfsshare (ocf::heartbeat:Filesystem): Started node02.test.com
nfs-daemon (ocf::heartbeat:nfsserver): Started node02.test.com
nfs-root (ocf::heartbeat:exportfs): Started node02.test.com
nfs_ip (ocf::heartbeat:IPaddr2): Started node02.test.com
nfs-notify (ocf::heartbeat:nfsnotify): Started node02.test.com
Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
The NFS share is running on the second node.
[root@node02 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 17G 1.3G 16G 8% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 39M 1.9G 3% /dev/shm
tmpfs 1.9G 8.6M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sda1 1014M 164M 851M 17% /boot
tmpfs 378M 0 378M 0% /run/user/0
/dev/mapper/vg_cluster-lv_cluster 90G 33M 90G 1% /Nfs
The first node is in passive mode, so the NFS share is not present.
[root@node01 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 17G 1.3G 16G 8% /
devtmpfs 1.9G 0 1.9G 0% /dev
tmpfs 1.9G 54M 1.8G 3% /dev/shm
tmpfs 1.9G 8.5M 1.9G 1% /run
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sda1 1014M 164M 851M 17% /boot
tmpfs 378M 0 378M 0% /run/user/0
Check the exports from client machine.
[root@srv ~]# showmount -e 192.168.200.123
clnt_create: RPC: Port mapper failure - Unable to receive: errno 113 (No route to host)
There was an error. So, we need to add firewall exceptions.
[root@node02 ~]# firewall-cmd --permanent --add-service=rpc-bind
success
[root@node02 ~]# firewall-cmd --permanent --add-service=mountd
success
[root@node02 ~]# firewall-cmd --permanent --add-port=2049/tcp
success
[root@node02 ~]# firewall-cmd --permanent --add-port=2049/udp
success
[root@node02 ~]# firewall-cmd --reload
success
Now, the command will be run successfully.
[root@srv ~]# showmount -e 192.168.200.123
Export list for 192.168.200.123:
/Nfs/BACKUPDIR1 192.168.200.0/255.255.255.0
The firewall exceptions need to be added on the first node as well.
[root@node01 ~]# firewall-cmd --permanent --add-service=rpc-bind
success
[root@node01 ~]# firewall-cmd --permanent --add-service=mountd
success
[root@node01 ~]# firewall-cmd --permanent --add-port=2049/tcp
success
[root@node01 ~]# firewall-cmd --permanent --add-port=2049/udp
success
[root@node01 ~]# firewall-cmd --reload
success
Testing NFS share
[root@srv ~]# mount 192.168.200.123:/Nfs/BACKUPDIR1 nfsshare
[root@srv ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/rhel-root 50G 8.0G 42G 16% /
devtmpfs 2.9G 0 2.9G 0% /dev
tmpfs 2.9G 80K 2.9G 1% /dev/shm
tmpfs 2.9G 17M 2.9G 1% /run
tmpfs 2.9G 0 2.9G 0% /sys/fs/cgroup
/dev/mapper/rhel-home 1.3T 253G 1019G 20% /home
/dev/sda1 497M 119M 379M 24% /boot
192.168.200.123:/Nfs/BACKUPDIR1 90G 33M 90G 1% /root/nfsshare
[root@srv ~]# ls /root/nfsshare/
BACKUPDIR2 BACKUPDIR4 hello_world1 hello_world3 hello_world5
BACKUPDIR3 BACKUPDIR5 hello_world2 hello_world4
Changing the node02 to standby mode.
[root@node01 ~]# pcs cluster standby node02.test.com
We can still access the NFS share and list the contents.
[root@srv ~]# ls /root/nfsshare/
BACKUPDIR2 BACKUPDIR4 hello_world1 hello_world3 hello_world5
BACKUPDIR3 BACKUPDIR5 hello_world2 hello_world4
So, we have successfully tested the Cluster.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.