Environment
Situation
Concepts:
By default for a cluster resource
(GroupWise resource included) , if the node or nss pool on the node go down ,
then of course the resource fails over , as configured, to another node.
However by default, if a GroupWise agent, like the POA, goes down, and if the
node and node pool are ok, the GroupWise POA will not restart automatically. This
is normal behavior.
However you can set up the GroupWise resource, in a cluster environment, to automatically restart a POA, DVA, MTA, or GWIA if they go down with the GroupWise "High Availability Service" (gwha).
There are some configuration changes that have to be made to make the gwha service work in the clustered environment with GroupWise.
The GroupWise High Availability service
relies on the GroupWise Monitor Agent to detect when a GroupWise agent is no
longer running. The Monitor Agent notifies the GroupWise High Availability
service of any problem, then the GroupWise High Availability service
immediately issues the command to start the problem agent. The GroupWise High
Availability Service runs as root, as configured
in the /etc/xinetd.d/gwha file.
A single Monitor Agent can service multiple
instances of the GroupWise High Availability service on multiple servers, as
long as all instances use the same user name and password (discussed later) to
communicate with the Monitor Agent.
Although you need a GroupWise High Availability service running on each Linux server where there are GroupWise agents, you need only one Monitor Agent to monitor all agents in your GroupWise system.
The Monitor Agent uses the --hauser and --hapassword switches to communicate with the GroupWise High Availability service on port 8400.
Resolution
Action Items:
Note: A local Linux user called "hauser" is used in these instructions. You can choose whatever name you want.
1. Go to a terminal as “root” and issue this command to Create the gwha user as a local Linux user on the first of the cluster nodes :
a.) useradd
-d /home/hauser -s /bin/bash -c "FHauser LHauser" hauser
b.) passwd hauser
c.) After creating the above user make certain you can login as the
user successfully, by going to the terminal and issuing the command "su
hauser" and verify you can log in with the password you specified. No quotes.
d.) When successful with the “hauser” login, logout of this account with
“exit”. No quotes.
e.) Create this same “hauser” with this same
procedure on every node in the cluster where the GroupWise resource could
potentially failover or migrate to.
2. Install the GroupWise
Monitor Agent :
Concept : Monitor Agent software is
installed to all nodes in the cluster, but only runs on 1 node at a time :
a.) You can check if you have already have the Monitor Agent installed by
issuing the following command at a terminal as “root”, on all nodes in the
cluster :
a. /etc/init.d/grpwise-ma status
b. If you get an error “No such file or
directory” you do not have it installed, go to Step # 3.
c. If you get a status of “running” you have
it installed, shut it down with “/etc/init.d/grpwise-ma stop” (no quotes) then
skip to Step # 3
d. If you get a status of “unused”, skip to
Step # 3.
3. Go To: https://www.novell.com/documentation/groupwise2012/gw2012_guide_interop/data/bxfkhaj.html
a. Read the short paragraph in the section
“Installing and Configuring the Linux Monitor Agent on Each Node in Your
Cluster”
b. Change to the
/opt/novell/groupwise/software/ installation directory (or wherever you have
the GroupWise installation directory files), and as “root”
c. run ./install,
d. Do Steps 1, 2, and 3 in the section “Running
the Linux Monitor Installation Program on the Preferred Node”, in this same
section for Step # 4, go to the hot link as listed - “Installing and
Configuring the Linux Monitor Agent”, and at that location, START with Step # 5
and then do Steps 5 thru 9 ONLY. Also on
this same Linux server modify and save the following
file "/etc/init.d/grpwise-ma" ,
with the following switch and values. Remove the # symbol in front of
this line and edit accordingly
i. MA_OPTIONS="--hauser hauser
--hapassword <passwordYouSpecifiedInStep1b> --hapoll 30"
ii. Note: Quotes are
used in the above syntax.
e. Go back to the previous documentation Web
URL as listed in Step 3d and continue where you left off on Step # 4, starting
with the text - “Pay special attention to the cluster resource information on
the System Options page”. Complete
the steps in this section. Disregard the
last bullet list item just before Step #5.
f.
Now do the steps in the section - “Running the Linux Monitor Agent
Installation Program on Subsequent Nodes”, Remember
this step will be done on ALL nodes in the cluster, one at a time
i. DO NOT do Step # 5 in this section. Use of SSL is not covered in this document. After you are done with this section, Exit
the GroupWise installation program.
g. At this point you need to copy 1 file
from the Linux server you initially installed the Monitor Agent to in Step
# 3d, to ALL nodes now. Copy the file
“rcgrpwise-ma” in the /usr/sbin/ directory to the same directory on each node
in the cluster. You can use this command
at a terminal as “root” :
i. scp /usr/sbin/rcgrpwise-ma root@<YourNode2DomainName>:/usr/sbin
ii. Do the same as above for node3, node4, etc
h. At this point you need to copy 1 file
from the Linux server you initially installed the Monitor Agent to in Step
# 3d, to ALL nodes now. Copy the file “grpwise-ma”
in the /etc/init.d/ directory to the same directory on each node in the cluster. You can use this command at a terminal as
“root” :
i. scp /etc/init.d/grpwise-ma root@<YourNode2DomainName>:/etc/init.d
ii. Do the same as above for node3, node4, etc
i. Do the steps in the following section:
“Testing the Linux Monitor Agent Installation on Each Node”, however do not do
the steps listed in the section: “Configuring the Monitor Agent Cluster
Resource To Load and Unload the Linux Monitor Agent”. This document will go over what is needed later
in example Load and Unload scripts.
4. Test whether GWHA daemon
is listening using the command "netstat -tnlp | grep 8400"
5. Make sure to have an “HTTP
User Name” and “HTTP Password” defined for the GroupWise Agents to be
monitored in ConsoleOne (Properties of the MTA, POA, and GWIA objects).
a. For the MTA and POA:
Under the GroupWise tab , Agent Settings :
i. HTTP User Name and
HTTP Password set under section “HTTP Monitor Settings”
b. For the GWIA : Under the GroupWise tab, Optional Gateway
Settings :
i. HTTP User Name and
HTTP Password set under section “HTTP Monitor Settings”
6. The following are my
example GroupWise Resource Unload and Load scripts. Because I have a lot of information with
regard to loading and unloading the GroupWise Agents and because there is a
limit in size of the Load and Unload scripts, I have placed the GroupWise load
and unload commands in a separate batch files (gwstart, gwstop) that is called
by the GroupWise Cluster Resource Load and Unload scripts. These are just examples; they work in my test
environment fine.
7. Note in the Unload
script the comments with regard to “gwha” and “xinetd”, they explain the reason
for the placement of these commands and in the script and what they do.
8. Copy the comments and
the 2 commands under them (in RED) into your Unload script of your GroupWise
Resource NOW, place this under the command “ncsfuncs”, at the top of your Unload
script :
#!/bin/bash
.
/opt/novell/ncs/lib/ncsfuncs
# Unload the xinetd
daemon, GWHA used in this NCS system
# This is needed so the
GroupWise agents can unload, otherwise
# xinetd and gwha would
just restart them
ignore_error
/sbin/chkconfig -s gwha off
ignore_error
/etc/init.d/xinetd stop
/root/gwstop
ignore_error ncpcon
unbind --ncpservername=CLUSTER-DATA-SERVER --ipaddress=10.10.10.10
ignore_error
del_secondary_ipaddress 10.10.10.10
ignore_error nss
/pooldeact=DATA
exit 0
9. Changes that you need to
make to your GroupWise Resource Load script, are in RED, here is mine to
show as an example of what works, placed just above the “gwstart” and “exit 0”
commands , do it NOW :
#!/bin/bash
.
/opt/novell/ncs/lib/ncsfuncs
exit_on_error nss
/poolact=DATA
exit_on_error ncpcon
mount DATA=254
exit_on_error
add_secondary_ipaddress 10.10.10.10
exit_on_error ncpcon
bind --ncpservername=CLUSTER-DATA-SERVER --ipaddress=10.10.10.10
ignore_error /sbin/chkconfig –s gwha on
ignore_error
/etc/init.d/xinetd start
/root/gwstart
exit 0
10. Here is my example
GroupWise Start and Stop batch files, You will need to ADD to your GroupWise
Cluster Resource LOAD and UNLOAD scripts ( or if you use a batch file like me )
the commands to START and STOP grpwise-ma
as noted (in RED) in the “gwstart” and “gwstop” script files , do it NOW.
- GWSTART - :
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
exit_on_error /etc/init.d/grpwise start Domain1
sleep 10
exit_on_error /etc/init.d/grpwise start gwdva
sleep 10
exit_on_error /etc/init.d/grpwise start Post1.Domain1
sleep 10
exit_on_error /etc/init.d/grpwise start GWIA.Domain1
sleep 10
exit_on_error /etc/init.d/grpwise-ma start
-
GWSTOP - :
#!/bin/bash
. /opt/novell/ncs/lib/ncsfuncs
ignore_error /etc/init.d/grpwise stop Domain1
sleep 10
ignore_error /etc/init.d/grpwise stop dva
Sleep 10
ignore_error /etc/init.d/grpwise stop Post1.Domain1
sleep 10
ignore_error /etc/init.d/grpwise stop GWIA.Domain1
sleep 10
ignore_error /etc/init.d/grpwise-ma stop
11. Open a browser and go to
url "http://<ipAddressOfMonitorAgentServer>:8200". You should see the
agents up and running and in the listening status.
12. Test by bringing down
one of the agents. Note that after about 45 seconds , the agent should start
automatically.
a. If you are not sure how
to stop your GroupWise agent for a test, do an “rcgrpwise status” at a Linux
terminal as “root” to find out the names of your GroupWise agent objects so
that you can then Unload one of them for a test :
Additional Information
Assumptions:
1. It is assumed that this document is not intended to be a complete step by step guide to setup GroupWise in a cluster in the Linux environment. This document is intended to show the minimal requirements and necessary configuration to allow the GroupWise High Availability Service (GWHA) to function properly with an existing GroupWise system that is already installed in the Linux cluster. For more detailed complete information you can review the Novell Documentation with regard to clustering GroupWise on Linux :
a. https://www.novell.com/documentation/groupwise2012/gw2012_guide_interop/data/bwc325u.html
2. If you want complete
information about implementing GroupWise Monitor in a Linux Cluster you can go
to:
a. https://www.novell.com/documentation/groupwise2012/gw2012_guide_interop/data/bwe3c4q.html
Note: This is not “Best
Practices” to have all of the below GroupWise agents running on 1 node, but is
used as an instructional example only.
3. It is assumed for the
purposes of this document that the GroupWise resource in the cluster has 1
GroupWise domains, 1 Post Office, a DVA, and a GWIA and the GroupWise Monitor
Agent.
4. It is assumed that
your cluster is running on SLES11 / OES11, and that your existing GroupWise
cluster resource (GroupWise MTA, DVA, POA, GWIA) is able to load, unload,
migrate and fail over correctly in the cluster.
5. It is assumed that the
cluster has a shared nss volume called DATA created and the mount point is
“/media/nss/DATA/”, where there is a sub-directory called /mail/ and under it
is located the directories for the GroupWise domain (domain2) and Post Office
(post1).
6. It is assumed in this document that you do not
have the GroupWise Monitor Agent installed yet.
If by chance you do have it installed then continue with the next steps
that you have not yet accomplished that are listed.