Browser fails to display NRM page

  • 7007098
  • 25-Oct-2010
  • 27-Apr-2012

Environment

Novell Open Enterprise Server 1 (OES 1) Linux
Novell Open Enterprise Server 2 (OES 2) Linux

Situation

Trying to access NRM, by going to https://IP_Address_or_DNSname:8009, the page is never displayed.
Further, when checking the status of NRM -- /etc/init.d/novell-httpstkd status -- the service is shown as dead.

Resolution

The method to resolving this matter, with the least impact, is to:

   1. Identify the dead process by running "ps aux | grep httpstkd | grep -v grep"
      The first part, "ps aux | grep httpstkd" will search running processes for httpstkd running.
      The second part, after the second pipe (|), will exclude the ps process from being displayed.
      Therefore, you should have only running httpstkd processes returned and the output will look like this:

      # ps aux | grep httpstkd | grep -v grep
      root      2447  0.0  0.3 197416  7852 ?        Sl   10:56   0:00 /opt/novell/httpstkd/sbin/httpstkd


      The number after "root" is the actual process ID, or PID.

   2. Kill the remaining process by running "kill <pid> " , replacing <pid> with the number from the ps output.
      In this case, it would be kill 2447 -- NOTE: it may take a few seconds for the process to be killed.  If you want to verify the process is removed, rerun step 1 until you don't see the PID returned.

   3. Restart the NRM service by running:
      rcnovell-httpstkd start
      or
      /etc/init.d/novell-httpstkd start

This will only restart the novell-httpstkd service (aka NRM) and not impact any other services.

Additional Information

Further investigation indicates the following may be seen in /var/log/messages and are a further indication of this issue:

After trying to launch NRM (while the process is dead), you may see:
   Jan  1 01:00:16 MyOESServer httpstkd[3617]: PAM_NAM: User admin.org unknown to the authentication module
   Jan  1 01:00:17 MyOESServer httpstkd[3617]: SSL_accept() ERROR: 0: error:00000000:lib(0):func(0):reason(0)


If just trying to restart  novell-httpstkd, or stop & start that service, you may see:
   Jan  1 01:01:01 MyOESServer httpstkd[5494]:  Error initializing sockets, ccode = 0x62

The root cause is that the previous instance became unresponsive but a process is still running.  Therefore, even if you try to restart novell-httpstkd, you will still be in the same situation as the old process is registered for listening on the specific tcp ports (8008 & 8009) and you receive an error initializing sockets.

If the above resolution does not work, there may be another application using the same ports (8008 & 8009).  If that is the case, killing the PIDs for httpstkd will not resolve the issue as that will not free the port(s).  An alternate resolution would be to:
  1. Verify the processes using ports 8008 and 8009 with:
    lsof -i | grep 8008
    lsof -i | grep 8009

    which would return something to the effect of:

    httpstkd   3525    root    4u  IPv4     10435       TCP *:8009 (LISTEN)
    httpstkd   4984    root    4u  IPv4     10435       TCP *:8009 (LISTEN)

  2. If there are other processes listed (than httpstkd), then either:
    - investigate reconfiguring them to utilize a different port
    - shutdown the other process and restart httpstkd

Finally, the solutions above are more expedient as the system remains up during the entire time -- only services are restarted.  Alternatively, you could restart the server (shutdown -r now ) but this would require the whole server to be unavailable for a brief time, and would require the services (httpstkd & others using ports 8008 and/or 8009) to be reconfigured so they do not conflict.