Checking the OS and DS Health for Inconsistent ZENworks behavior.

  • 3197766
  • 27-Sep-2006
  • 30-Apr-2012

Environment

Novell ZENworks for Desktops 2.0.
Novell ZENworks for Desktops 3.0.
Novell ZENworks for Desktops 3.2.
Novell ZENworks for Desktops 4.0.1 - ZfD4.0.1.
Novell ZENworks 6.5 Desktop Management - ZfD6.5.
Novell ZENworks 7 Desktop Management - ZfD7
Novell ZENworks for Servers 2.0.
Novell ZENworks for Servers 3.0.
Novell ZENworks 6.5 Server Management - ZfS65.
Novell ZENworks 7 Server Management - ZfS7
Novell ManageWise 2.7.
Novell Management Agent for NetWare 2.7.

Situation

Application objects not always showing up in NAL .
NAL comes up blank.
Some applications deliver when NAL is ran.
Different applications deliver each time NAL is ran.
Rebooting PC will sometimes get the objects to show up.
Saving applications in NWADMN32.EXE or CONSOLEONE.EXE does not save the changes.
Saving associations in NWADMN32.EXE or CONSOLEONE.EXE does not save the changes.
Changes in applications do not save.
Changes in applications do not show up for many minutes after the application is saved.
Policies do not apply consistently.
Policies do not work for everyone.
Printer policies do not work for all applications.
Everything was working at one time.
ZFS 2 installed with the OS and DS health in a bad state.
Inconsistencies in reported disk space in the ManageWise Console utility.
Checking the OS and DS Health for Inconsistent ZENworks behavior.

Resolution

Once the volume space is to minimum recommendations (Additional Information steps in this TID) for all volumes on all servers in the replica ring, the following process will check which replicas have the symptoms and which don't.

Note: KB 10099070 'Application objects are missing stream files on one or more replicas' (https://www.novell.com/support) , provided by eDir support has been released that may avoid doing the following steps:

1. Log a workstation into the server holding the Master replica of the ring and map drives to needed volumes, especially servers that have ZENworks files installed, ZENworks application files, NWADMN32.EXE , and/or CONSOLEONE.EXE . These resources can be made available at a later time by logging into and/or mapping drives for the one server with the unlocked DS database. NWADMN32.EXE , CONSOLEONE.EXE , and NAL should be ran manually after all the DS databases are locked. This will force the use of the one unlocked DS database for all DS information in subsequent steps.

2. On the server holding the Master (or next) replica for the ring, Load DSREPAIR.NLM , select 'Advanced options menu' and 'Create NDS archive'. This will provide a DIB (backup file) of the present DS database for Novell DS support to restore--if needed.

Not all versions of DSREPAIR.NLM should be used. Contact Novell DS Support if uncertain which ones to use.

3. From the 'Advanced options menu', select 'Repair local DS database', leave all settings default and enable 'Rebuild operational schema. This will also enable 'Lock NDS database during the entire repair'. Run the repair (F10) once leaving the log file open so the DS database will remain locked and noting the errors, if any, that are reported.

4. Do steps 2 & 3 on all servers holding replicas and subordinate references, working from the Master out, till the repair has been ran and all DS databases are in a locked state.

5. Return to the server with the Master/next replica and run the same repair until there are zero (0) errors, or until the same errors do not resolve after several repairs. For errors that persist, check for existing TIDs that explain them, or contact Novell DS support to help resolve them.

6. Exit the log file so that the 'Rebuild operational schema' screen returns.

7. It is at this point that the DS replica on the Master/next server can be tested for whatever symptoms are being experienced. Manually run NWADMN32.EXE , CONSOLEONE.EXE , or NAL , note the behavior. For ZENworks for Servers, this step may not be as clear cut. If not helpful or relevant, finish items 8-12.

8. Return to the DSREPAIR 'Rebuild operational schema' repair and run it again on the Master/next replica server and leave the repair log file open. Go to the next server.

9. Do steps 5, 6, 7, & 8 on all servers holding replicas.

10. Do step 5 on Sub Ref servers before unlocking all databases.

11. Once replicas have been identified as good or bad, the work needing to be done to correct the problem replica(s) has been determined. If all replicas display the symptoms, then it is possible that all replicas are bad. If recreating an object (application or policy) resolves the issue(s) with that one object, that new object will be sync'd to the other replicas. When the new object is sync'd to the other replicas, if it has stream files, they will copy to the other replica servers as well. Application objects use stream files. The bigger the application, the more stream files it will have (ZFS Distributions of NAL objects could be affected by OS volume space and DS file corruption). If few objects display the symptoms, re-creating them may be an option. It may be easier to remove a replica from a server and restore it from a server that has a good replica of everything. There are DS TIDs that cover this process and DS support is available for help implementing them.

Note: Lack of volume space (free blocks) is not the only reason DS / eDir object can become corrupt or 'broken' and require being recreated. If recreating the object (ex: NAL application object) resolves the symptom, that is the solution, especially if nothing else does. Recreating the object can be a troubleshooting step to help identify the root cause. This troubleshooting step can be done early, or late in the investigation, but it may be the final solution.

12. This process may need to be done on all servers, volumes, and replicas in the tree and is not limited to just ZENworks issues. If volumes run below the minimum recommended free space (the smaller the free space and the longer it runs in that state, the greater the risk of problems), the OS and DS can be affected in some manner and should be checked for good health. If the install of the ZENworks product was done when the OS and DS Health were in a bad state, a re-install of the product after getting the OS and DS in good health is suggested. If a re-install is done, an un-install of the ZENworks product may be necessary, especially if a re-install over-the-top does not resolve the install, or run-after-install, issues.

Additional Information

Some of the replicas in the ring are corrupt, or have corrupt stream files.
To investigate possible causes of the corruption:

1. The first item to check is volume space on all volumes on all servers in the tree: 20% Free blocks minimum if a replica is on the volume; 10-20% Free blocks minimum w/o a replica, including non-SYS volumes. NetWare Documentation (NetWare Server Disks and Storage Devices: Optimizing, Recommendations
NW 5.1 , Optimizing the File System Performance, Optimizing Storage Disk Capacity NW 6 , NW 6.5 ) supports maintaining minimum free blocks and space.

A new utility is available that can make clean up easier: PURGE_NW.NLM .


A. Use the utility JCMD.NLM to investigate hidden purgable files in the _NETWARE directory that cannot be seen, nor purged by DOS or Windows based NetWare utilities (for related issues, see TIDs 10063876 'The SYS volume ran out of space' and 10064510 'Slow Server Performance due to Disk Space'
). It can be downloaded from www.netwarefiles.com, NetWare Server Utilities, JCMD.ZIP (JCMD146.ZIP ).

B. Copy JCMD.NLM to SYS:SYSTEM and load it from the server console prompt.

C. Load MONITOR.NLM from the server console prompt, select VOLUMES (the percentages on this screen do not include the 'Freeable blocks in the salvage system:', use them as a reference initially), select SYS , then press the TAB key and check the 'Total blocks:', 'Free blocks:', and 'Freeable blocks in the salvage system:'. The three values should equal 100% of the volume. The recommended free space is in item 1 and the TIDs referenced in items 2 & 3 below. Use the 'block' values for all volumes and get the appropriate free space available.

D. Load JCMD.NLM . This should open a screen with the 'SYS:\>' as the prompt. Type 'help' for command line syntax.

E. Type 'ATTR _NETWARE' . The NetWare attributes reported back should typically be 'P' & 'D' . Occasionally, the 'P' will be missing. The cause of this is still being researched.

F. If it is missing, type 'ATTR _NETWARE P+' , and it will be restored. Files deleted in the _NETWARE after the 'P' attribute is reset will be automatically deleted. Deleted files that existed in the directory before the attribute was reset, must be manually purged.

NOTE: The 'P' attribute is used for Free Blocks & Freeable Blocks management. Not having the purge attribute is not the cause of the corruption, the lack of Free Blocks can be. Objectives of this TID is to troubleshoot & recover from current corruption and avoid it in the future by finding the easiest way to set up a server to minimize work to maintain the most Free Blocks, and not go below the OS and DS recommended true Free Blocks? Setting that attribute on and purging the existing deleted files is just a step in optimizing the state and performance of a NetWare server and minimizing server maintenance.

Each company would be best served to make a business decision on how to maintain the maximum Free Blocks and minimize Freeable Blocks--and take appropriate action when necessary to address a lack of recommended Free Blocks.

A common source of the 'P' (Purge) attribute being turned off for the SYS:_NETWARE directory is the migration of server OSes. If all the servers do not have the 'P' attribute on the _NETWARE directory, were they migrated? Migrations are not the only source of the 'P' attribute being turned off as this has been seen on new installs as well.

If you choose to leave that attribute off, consider the ramifications. The DOS/Windows Purge utilities do not see the _NETWARE directory so they will not purge it. Once it is determined the 'P' attribute is off and that files in the _NETWARE directory need to be purged, the only option other than JCMD (or CPQFM.NLM ) is to dismount the volume(s) and run VREPAIR with the 'Purge all deleted files' enabled. This would require a complete manual check of the 'P' attribute on _NETWARE , dismount the volume(s), and run of VREPAIR . Also, how big are the files in that directory and how fast can volume space be used up by those deleted files? How often will this have to be manually checked, if the purge attribute is left off?? How many servers and volumes are there?

Also note that the _NETWARE directory may have subdirectories and they might not have the purge attribute. Use Monitor to verify that'Freeable blocks in the salvage system:' goes to zero when all files have been purged. If not zero, check the _NETWARE subdirectories if the DOS/Windows Purge utilities indicate all their files have been purged.

G. Type 'CD _NETWARE' , then type 'SALV' .  If any files are listed, they are files that have been deleted (by the OS and/or DS) that should have been automatically purged (the DIR command will list the non-deleted files that exist in the directory and the free space (in Kb) on the volume).

H. If 'SALV' list files, purge them with the command 'SALV /PA' . The 'A' allows for a screen full of files to purge rather than one file at a time. Verify that the free space is at least at the recommended minimum. VREPAIR can be used to purge these same files (see TIDs referenced in item A above), but JCMD.NLM does not require the dismounting of the volume as VREPAIR does.

I. EXIT JCMD.NLM to close that screen.

The SYS:_NETWARE directory contains DS stream files. The stream files contain information about DS objects that are not in the DS databases. Each replica has their own copy of the stream files. Running out of volume space can corrupt these files on that volume, as well as on other replica servers.

2. Get TID 10012765 'Performance, Tuning, and Optimization Prior to Netware 5.x',
and implement recommendations for the OS.  TIDs 10096649 'NetWare 6.x Memory Fragmentation & Tuning' and 3920657 'Memory Fragmentation Issue with NetWare 6.0 and 6.5' (Formerly known as TID# 10091980) are referenced.

3. Get TID 3564075 NDS Health Check Procedures - Cross Platform 
and validate the health of the DS replicas and communications and install the latest WSOCK (WINSOCK) & TCPIP updates appropriate for the server.

4. Has a server had a recent abend, power outage, power-off, or hardware failure that might have corrupted files on the volumes. If so, recovery will depend on the effects of what happened and the symptoms that persist.
.

Formerly known as TID# 10062741