Environment
Novell Open Enterprise Server 2 Support Pack 2
Novell Cluster Services
Dynamic Storage Technology (Shadow Volumes or DST)
McAfee VSE for Linux 1.6
Novell Cluster Services
Dynamic Storage Technology (Shadow Volumes or DST)
McAfee VSE for Linux 1.6
Situation
As
part of a NetWare 6.5 migration project to Novell Open Enterprise
Server 2, a customer is installing a new Novell Open Enterprise Server 2
environment consisting of multiple NCS cluster nodes from scratch, and
plans on using new functionality such as Dynamic Storage Technology.
In addition, it is the customers requirement in the environment the McAfee Virus scanner be scanning the file system for viral threads.
During various test phases a number of problems in the combination of Novell Cluster Services, Shadow volumes (DST), Auditing and McAfee anti-virus software were encountered.
Additional symptoms :
Additional symptoms :
Additional symptoms :
In addition, it is the customers requirement in the environment the McAfee Virus scanner be scanning the file system for viral threads.
During various test phases a number of problems in the combination of Novell Cluster Services, Shadow volumes (DST), Auditing and McAfee anti-virus software were encountered.
- Problem 1:
Startup conflict between the Novell NCS and the McAfee kernel modules.
Additional symptoms :
- Cluster node does not reboot when 'init 6' or 'reboot' command is issued
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
- Problem 2:
The '/media/shadowfs/' mount point does not show up when McAfee Anti-Virus software is loaded.
- Cluster node does not reboot when 'init 6' or 'reboot' command is issued
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
- Problem 3:
When McAfee's nails service (/etc/init.d/nails) is running, and a Cluster resource with DST volumes is brought online, the mount process may get stuck.
- Cluster node does not reboot when 'init 6' or 'reboot' command is issued
- Server gets stuck during shutdown (while stopping / unloading cluster services modules)
- Problem 4:
With both the audit daemon (/etc/init.d/auditd), and McAfee's nails service (/etc/init.d/nails) running, we were observing a memory leak. The slabcache was continuously growing in size, until it consumed all memory and the server required a reboot.
- Problem 5:
McAfee's lshook.ko prevents the unloading of the Novell adminsfs.ko kernel module.
- Problem 6:
NCP Volume does not get mounted.
- Problem 7:
McAfee technical collateral states no support for SAN Attached servers.
- Problem 8:
McAfee technical collateral states no support for Novell Cluster Services
Resolution
- Solution 1 :
The startup conflict can be
addressed by modifying the '/etc/init.d/nails' script
and adding the following line in the "INIT INFO" section :
#Should-Start: novell-ncs nss
#Should-Stop: novell-ncs nss
#Should-Start: novell-ncs nss
#Should-Stop: novell-ncs nss
I.e. something as follows :
### BEGIN INIT INFO
# Provides: anti-virus
# Should-Start: novell-ncs nss
# Should-Stop: novell-ncs nss
# Required-Start: $local_fs $network ndsd cma
# Required-Stop:
# Default-Start: 2 3 4 5
# Default-Stop:
# Description: LinuxShield anti-virus services
### END INIT INFO
- Solution 2 and 3 :
mount() for FUSE cannot complete until the userspace daemon has responded to the INIT message. A statfs() call expects a file system to be there to respond. For normal, in-kernel file systems this isn't a problem, because there is code to respond, but for FUSE the code can't work until the userspace daemon has responded. Therefor, when McAfee called statfs() as part of the mount() -call, everything hung.
What was done to work around the problem was that we added FUSE as an exception in the module, so that statfs() wouldn't be called if the file system was FUSE based
We have provided our thoughts back to McAfee to:
- adding a config option for "do not do statfs for = smb,fuse,adminfs,..." etc.
- adding a config option for "exclude filesystems = fuse,adminfs,..." etc.
We have provided our thoughts back to McAfee to:
- adding a config option for "do not do statfs for = smb,fuse,adminfs,..." etc.
- adding a config option for "exclude filesystems = fuse,adminfs,..." etc.
- Solution 4 :
When auditing (/etc/init.d/auditd) is
active, the "names"
data structures only get deleted when a syscall exits. What we
suspect is happening within the AntiVirus code is that the code is
scanning an entire directory structure inside one single syscall,
and since that syscall doesn't exit, the 'names' data structure
does not get deleted.
(Note : We observed the memory leak using the 'slabtop' command)
This AntiVirus software uses putname() calls in the code, and to resolve this problem __putname() should be used.
A test was conducted by replacing the above calls and recompiling the software, and in the recompiled version the AntiVirus engine and the audit daemon perfectly co-exist.
Last update : McAfee has provided a solution to fix the audit and anti-virus co-existence problem in a hotfix.
Please contact McAfee support when required.
(Note : We observed the memory leak using the 'slabtop' command)
This AntiVirus software uses putname() calls in the code, and to resolve this problem __putname() should be used.
A test was conducted by replacing the above calls and recompiling the software, and in the recompiled version the AntiVirus engine and the audit daemon perfectly co-exist.
Last update : McAfee has provided a solution to fix the audit and anti-virus co-existence problem in a hotfix.
Please contact McAfee support when required.
- Solution 5 :
In McAfee LinuxShield, some
function pointers of various file-system modules are patched to
intercept the file I/O calls. Putting the pointers back and
unloading the kernel module has been disabled for specific reasons,
hence lshook.ko
module is never unloaded.
- Solution 6 :
Give the nails user sufficient
privileges on all NSS volume as described in the McAfee
documentation.
E.g.: "rights -f /media/nss/<VOL-name> -r s trustee nails.<context>.<tree>"
Exclude the '/.*/\._NETWARE' directory from being scanned.
E.g.: "rights -f /media/nss/<VOL-name> -r s trustee nails.<context>.<tree>"
Exclude the '/.*/\._NETWARE' directory from being scanned.
- Solution 7 :
McAfee has released a hot fix,
which after applying officially supports SAN Environments.
Please contact McAfee support when required.
Please contact McAfee support when required.
- Solution 8 :
As per October 28th, 2011, McAfee has released McAfee VirusScan Enterprise for Linux v1.7 with official support for Novell Cluster Services.
Additional Information
Novell does not encourage our customers to manually modify any McAfee
code as perhaps suggested in this document but contact McAfee support for a proper solution.
The tests performed, analysis shared and possible re-compiling any code as mentioned in this TID were performed to analyze the reported problems and where possible confirm any solution.
The results have been fed back to our colleagues at McAfee who ultimately need to incorporate this feedback in a new update.
Note to problem 1 :
This has been communicated to McAfee, who have informed us this is planned to be fixed in a newer release of the McAfee software.
Note to problem 2 :
In the '/etc/init.d/nails' script, search for the following sentence :
fstypes=`cat /proc/filesystems|grep -v tmpfs | sed -n -e 's/[[:space:]]*\(nodev\)\?[[:space:]]*//;H' -e '$x;s/^\n//;s/\n/|/gp'`
and replaced this by :
fstypes=`cat /proc/filesystems|grep -v tmpfs |grep -v fuse|grep -v adminfs | sed -n -e 's/[[:space:]]*\(nodev\)\?[[:space:]]*//;H' -e '$x;s/^\n//;s/\n/|/gp'`
This will also in addition also exclude fuse and adminfs file sytem from being scanned. Please be aware that excluding fuse in the nails script here will prevent all fuse file systems from being scanned for viruses, which may not be an acceptable solution.
Note to problem 3 :
When creating an strace of the action where the resource should come on-line,it was noticed that during the mount process, the PID got stuck at :
4920 09:11:52.793684 mount("fuse", "/media/shadowfs/TEST01", "fuse", MS_NOSUID|MS_NODEV,"allow_other,default_permissions,fd=6,rootmode=40000,user_id=0,group_id=0"<unfinished ...>
It was concluded that the shadowfs daemon calls into the mount() syscall, which generates the fuse INIT message, which shadowfs itself is meant to respond to. In normal operation this exits, and shadowfs can respond. In this scenario, because of the added statfs call added by the McAfee module, shadowfs stays in the kernel waiting for it's own response, which than fails
Note to problem 4 :
When installing for example 'novell-afp' package this will activate the audit daemon by default. When auditing is active, and the virus scanner is active, the 'names' data structures only get deleted when a syscall exists. When scanning an entire directory structure in one single syscall (which doesn't exist), the names don't get deleted. This was consuming the memory.
** Installing the novell-afp service automatically enables the '/etc/init.d/auditd' auditing daemon.
** The discrepancy in co-existence between '/etc/init.d/auditd' and anti virus software has been observed with the software from various anti virus vendors.
The tests performed, analysis shared and possible re-compiling any code as mentioned in this TID were performed to analyze the reported problems and where possible confirm any solution.
The results have been fed back to our colleagues at McAfee who ultimately need to incorporate this feedback in a new update.
Note to problem 1 :
This has been communicated to McAfee, who have informed us this is planned to be fixed in a newer release of the McAfee software.
Note to problem 2 :
In the '/etc/init.d/nails' script, search for the following sentence :
fstypes=`cat /proc/filesystems|grep -v tmpfs | sed -n -e 's/[[:space:]]*\(nodev\)\?[[:space:]]*//;H' -e '$x;s/^\n//;s/\n/|/gp'`
and replaced this by :
fstypes=`cat /proc/filesystems|grep -v tmpfs |grep -v fuse|grep -v adminfs | sed -n -e 's/[[:space:]]*\(nodev\)\?[[:space:]]*//;H' -e '$x;s/^\n//;s/\n/|/gp'`
This will also in addition also exclude fuse and adminfs file sytem from being scanned. Please be aware that excluding fuse in the nails script here will prevent all fuse file systems from being scanned for viruses, which may not be an acceptable solution.
Note to problem 3 :
When creating an strace of the action where the resource should come on-line,it was noticed that during the mount process, the PID got stuck at :
4920 09:11:52.793684 mount("fuse", "/media/shadowfs/TEST01", "fuse", MS_NOSUID|MS_NODEV,"allow_other,default_permissions,fd=6,rootmode=40000,user_id=0,group_id=0"<unfinished ...>
It was concluded that the shadowfs daemon calls into the mount() syscall, which generates the fuse INIT message, which shadowfs itself is meant to respond to. In normal operation this exits, and shadowfs can respond. In this scenario, because of the added statfs call added by the McAfee module, shadowfs stays in the kernel waiting for it's own response, which than fails
Note to problem 4 :
When installing for example 'novell-afp' package this will activate the audit daemon by default. When auditing is active, and the virus scanner is active, the 'names' data structures only get deleted when a syscall exists. When scanning an entire directory structure in one single syscall (which doesn't exist), the names don't get deleted. This was consuming the memory.
** Installing the novell-afp service automatically enables the '/etc/init.d/auditd' auditing daemon.
** The discrepancy in co-existence between '/etc/init.d/auditd' and anti virus software has been observed with the software from various anti virus vendors.