Novell Open Enterprise Server 2, Novell Cluster services, Shadow volumes and McAfee Anti-Virus 1.6

  • 7006838
  • 14-Sep-2010
  • 27-Apr-2012

Environment

Novell Open Enterprise Server 2 Support Pack 2
Novell Cluster Services
Dynamic Storage Technology (Shadow Volumes or DST)
McAfee VSE for Linux 1.6


Situation

As part of a NetWare 6.5 migration project to Novell Open Enterprise Server 2, a customer is installing a new Novell Open Enterprise Server 2 environment consisting of multiple NCS cluster nodes from scratch, and plans on using new functionality such as Dynamic Storage Technology.

In addition, it is the customers requirement in the environment the McAfee Virus scanner be scanning the file system for viral threads.

During various test phases a number of problems in the combination of Novell Cluster Services, Shadow volumes (DST), Auditing and McAfee anti-virus software were encountered.

  • Problem 1:
Startup conflict between the Novell NCS and the McAfee kernel modules.

Additional symptoms :
- Cluster node does not reboot when 'init 6' or 'reboot' command is issued
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
  • Problem 2:
The '/media/shadowfs/' mount point does not show up when McAfee Anti-Virus software is loaded.

Additional symptoms :
- Cluster node does not reboot when 'init 6' or 'reboot' command is issued
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
  • Problem 3:
When McAfee's nails service (/etc/init.d/nails) is running, and a Cluster resource with DST volumes is brought online, the mount process may get stuck.

Additional symptoms :
- Cluster node does not reboot when 'init 6' or 'reboot' command is issued
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
  • Problem 4:
With both the audit daemon (/etc/init.d/auditd), and McAfee's nails service (/etc/init.d/nails) running, we were observing a memory leak. The slabcache was continuously growing in size, until it consumed all memory and the server required a reboot.

  • Problem 5:
McAfee's lshook.ko prevents the unloading of the Novell adminsfs.ko kernel module.

  • Problem 6:
NCP Volume does not get mounted.

  • Problem 7:
McAfee technical collateral states no support for SAN Attached servers.

  • Problem 8:
McAfee technical collateral states no support for Novell Cluster Services

Resolution

  • Solution 1 :
The startup conflict can be addressed by modifying the '/etc/init.d/nails' script and adding the following line in the "INIT INFO" section :
#Should-Start: novell-ncs nss
#Should-Stop: novell-ncs nss

I.e. something as follows :
### BEGIN INIT INFO
# Provides:       anti-virus
# Should-Start: novell-ncs nss
# Should-Stop: novell-ncs nss
# Required-Start: $local_fs $network ndsd cma
# Required-Stop:
# Default-Start:  2 3 4 5
# Default-Stop:
# Description:    LinuxShield anti-virus services
### END INIT INFO

  • Solution 2 and 3 :
mount() for FUSE cannot complete until the userspace daemon has responded to the INIT message. A statfs() call expects a file system to be there to respond. For normal, in-kernel file systems this isn't a problem, because there is code to respond, but for FUSE the code can't work until the userspace daemon has responded. Therefor, when McAfee called statfs() as part of the mount() -call, everything hung.

What was done to work around the problem was that we added FUSE as an exception in the module, so that statfs() wouldn't be called if the file system was FUSE based

We have provided our thoughts back to McAfee to:
- adding a config option for "do not do statfs for = smb,fuse,adminfs,..." etc.
- adding a config option for "exclude filesystems = fuse,adminfs,..." etc.

  • Solution 4 :
When auditing (/etc/init.d/auditd) is active, the "names" data structures only get deleted when a syscall exits. What we suspect is happening within the AntiVirus code is that the code is scanning an entire directory structure inside one single syscall, and since that syscall doesn't exit, the 'names' data structure does not get deleted.
(Note : We observed the memory leak using the 'slabtop' command)

This AntiVirus software uses putname() calls in the code, and to resolve this problem __putname() should be used.

A test was conducted by replacing the above calls and recompiling the software, and in the recompiled version the AntiVirus engine and the audit daemon perfectly co-exist.

Last update  : McAfee has provided a solution to fix the audit and anti-virus co-existence problem in a hotfix.
Please contact McAfee support when required.

  • Solution 5 :
In McAfee LinuxShield, some function pointers of various file-system modules are patched to intercept the file I/O calls. Putting the pointers back and unloading the kernel module has been disabled for specific reasons, hence lshook.ko module is never unloaded.

  • Solution 6 :
Give the nails user sufficient privileges on all  NSS volume as described in the McAfee documentation.
E.g.: "rights -f /media/nss/<VOL-name> -r s trustee nails.<context>.<tree>"

Exclude the '/.*/\._NETWARE' directory from being scanned.

  • Solution 7 :
McAfee has released a hot fix, which after applying officially supports SAN Environments.
Please contact McAfee support when required.

  • Solution 8 :
As per October 28th, 2011, McAfee has released McAfee VirusScan Enterprise for Linux v1.7 with official support for Novell Cluster Services.

Additional Information

Novell does not encourage our customers  to manually modify any McAfee code as perhaps suggested in this document but contact McAfee support for a proper solution.

The tests performed, analysis shared and possible re-compiling any code as mentioned in this TID were performed to analyze the reported problems and where possible confirm any solution.
The results have been fed back to our colleagues at McAfee who ultimately need to incorporate this feedback in a new update.

Note to problem 1 :
This has been communicated to McAfee, who have informed us this is planned to be fixed in a newer release of the McAfee software.

Note to problem 2 :
In the '/etc/init.d/nails' script, search for the following sentence :
fstypes=`cat /proc/filesystems|grep -v tmpfs | sed -n -e 's/[[:space:]]*\(nodev\)\?[[:space:]]*//;H' -e '$x;s/^\n//;s/\n/|/gp'`

and replaced this by :
fstypes=`cat /proc/filesystems|grep -v tmpfs |grep -v fuse|grep -v adminfs | sed -n -e 's/[[:space:]]*\(nodev\)\?[[:space:]]*//;H' -e '$x;s/^\n//;s/\n/|/gp'`

This will also in addition also exclude fuse and adminfs file sytem from being scanned. Please be aware that excluding fuse in the nails script here will prevent all fuse file systems from being scanned for viruses, which may not be an acceptable solution.


Note to problem 3 :
When creating an strace of the action where the resource should come on-line,it was noticed that during the mount process, the PID got stuck at :
4920  09:11:52.793684 mount("fuse", "/media/shadowfs/TEST01", "fuse", MS_NOSUID|MS_NODEV,"allow_other,default_permissions,fd=6,rootmode=40000,user_id=0,group_id=0"<unfinished ...>

It was concluded that the shadowfs daemon calls into the mount() syscall, which generates the fuse INIT message, which shadowfs itself is meant to respond to. In normal operation this exits, and shadowfs can respond. In this scenario, because of the added statfs call added by the McAfee module, shadowfs stays in the kernel waiting for it's own response, which than fails


Note to problem 4 :
When installing for example 'novell-afp' package this will activate the audit daemon by default. When auditing is active, and the virus scanner is active, the 'names' data structures only get deleted when a syscall exists. When scanning an entire directory structure in one single syscall (which doesn't exist), the names don't get deleted. This was consuming the memory.

**  Installing the novell-afp service automatically enables the '/etc/init.d/auditd' auditing daemon.
**  The discrepancy in co-existence between '/etc/init.d/auditd' and anti virus software has been observed with the software from various anti virus vendors.