Cluster resources go comatose because ncpcon volume command fails.

  • 7008711
  • 06-Jun-2011
  • 27-Apr-2012

Environment


Novell Cluster Services 1.8.7
Novell Cluster Services 1.8.8
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3

Situation

The default cluster monitoring scripts include 'ncpcon volume' command which fails with error code =22 (strace shows clienterr=22) while eDirectory is locked during an ndsrepair.  The monitoring script simply shows it failing with a "Volume not found" error. This is causing the cluster resource to go comatose if the duration of the ndsrepair is longer than 60 seconds which is the default for how often the cluster monitoring script runs.

Resolution

Fixed in novell-ncpenc-5.1.5-0.41 for SP3 (May Maintenance patch release)

Additional Information

Resource.monitor.out shows the following output for my resource called DATA2
+ exit_on_error ncpcon volume DATA2
+ ncpcon volume DATA2
... Executing " volume DATA2"
Volume DATA2 not found.

... FAILED completion [elapsed time = 950 usecs]
+ rc=1
+ '[' '!' 1 -eq 0 ']'
+ exit 1