Manually Removing All Replicas From a NetWare Server

  • 7001592
  • 10-Oct-2008
  • 26-Apr-2012

Environment

Novell NetWare 4.2
Novell NetWare 4.11
Novell NetWare 5.1
Novell NetWare 5.0
Novell NetWare 6.0
Novell Directory Services
-XK2
DSRepair -xk2

Situation

Steps to Manually Removing All Replicas From a NetWare Server.

Resolution

*****WARNING*****

Only follow this procedure under the advisement of Novell Technical Support.  If performed incorrectly, this procedure can cause major damage to a directory tree.  Novell does not support this procedure unless under the specific recommendations of a Technical Support Engineer.  Contact Novell Technical Support for more information.
*****CAUTION*****

Prior to running the DSREPAIR -XK2, make sure that the server is not hosting a DirXML driver. DirXML uses some server specific information that is not synchronized with the other servers holding copies of that partition, and as such, that information will be lost. This will result in the driver set showing up as an unknown status. If there is a need to run the -XK2 process on the server, export all drivers and information prior to doing so in addition to shutting down the drivers. This will allow for the driver set to be deleted recreated and then re-imported when the server is ready.

To export a driver, follow the steps in the solution TID #10075323 - How to export and import a DirXML Driver.
The following steps will force all replicas off of a server and clean up the replica rings.  This process has many implications:
A) If the only real replica of a partition exists on this server, all data for objects in this partition will be permanently lost.
B) Once started, this process cannot be halted.
C) If not followed correctly, this procedure can cause more damage to the NDS tree.
D) If running NDS 8.x, make sure the DSREPAIR.NLM is version 7.28 or higher.

Complete all steps.

1. Verify that the defective server does not hold the master replica of any partition. If the server holds master replicas, other servers need to be designated as the new master before the replicas are removed off the server.  This is best performed from DSREPAIR on the server that is going to be designated as the new master for the partition.  On the server that will become the new master, LOAD DSREPAIR -A | Advanced Options | Replica and Partition Operations | <replica name> | Designate this Server as the New Master Replica.  Make sure the change synchronizes out to all other servers in the replica ring.

2. Create a database backup file.  From the server console of the server you are removing the replicas from. run DSREPAIR -RC.  This copies a database dump file to SYS:SYSTEM\DSREPAIR.DIB if running NDS 7.x or lower.  If running NDS 8 or higher, it creates a 00000000.$DU file in SYS:SYSTEM\DSR_DIB.  After everything is cleaned up and working fine, you can delete this file to free up space.

You can also create the DS backup manually using DSREPAIR | Advanced Options | Create Database Dump File or NDS Archieve Options (for eDir versions). If a backup has been created in past, it will provide the option to overwrite the backup which will become the latest backup.

3. Write down a list of ALL REPLICAS that the defective server holds, including Subordinate Reference replicas.  This list can be found in DSREPAIR | Advanced Options | Replica and Partition Operations. (The window should say "Replicas stored on this server".)

4.  From the Server Console of the defective server, LOAD DSREPAIR -XK2 | Advanced Options | Repair Local DS Database.  LEAVE THE DEFAULT OPTIONS IN PLACE (the two options that must be used at a minimum are Check Local References & Perform Database Structure Check).  The -XK2 switch removes all replicas from the server when a local database repair is performed.  This operation DOES NOT remove Directory Services from the server.  No server information or references should be lost in this process.  However, if bindery services was enabled on the server, it will not work properly until the proper replicas are added back.  This procedure also resets all of the objects that were local on the server to externally referenced objects in a "reference" state.  This allows the server to redirect calls to objects to other servers that hold actual copies of those objects.  On certain older versions of DSREPAIR, -XK3 is needed to manually set externally referenced objects to a reference state.  It does not hurt to add -XK3 after -XK2 (DSREPAIR -XK2 -XK3) just in case you are using an older DSREPAIR.NLM. The repair may take anywhere from 30 seconds to over an hour, depending on how many replicas were stored on the server and the speed of the hardware.  The repair will take about as long as a typical database repair would have previously taken on the server.  While the repair is running, do step 5.  If the repair completes while you are doing step 5, do not save the database.  Finish step 5 first.

5. For each replica in the list that you wrote down, the following procedure needs to be completed.  The other servers in the replica rings need to know that the defective server is no longer going to hold replicas.  The reason for this is because a DSREPAIR -XK2 removes all replicas off of a server, but it DOES NOT notify the other servers in the replica ring that it no longer holds those replicas.  Therefore, the defective server needs to be manually removed from the replica ring for each partition that was present on the server.  

a.  From another server in each replica ring, LOAD DSREPAIR -A (which shows advanced options).  This should be a server that holds a Master or Read/Write replica, but Subordinate References will work as well if there are no other options.

b.  Select ADVANCED OPTIONS | REPLICA AND PARTITION OPERATIONS.

c.  Highlight and select a Replica on your list made in step 2 above.

d.  Select  VIEW REPLICA RING.

e.  You should normally see the defective server's name in the replica ring list.  Highlight and select the problem server in the replica list.  If you do not see the defective server in the list, it has already been removed (by the system).  Skip the remaining steps and move on to the next partition's replica ring.

f.  On the menu that comes up select REMOVE THIS SERVER FROM THE REPLICA RING.

g.  Enter the ADMIN username and password.

h.  Type the words I AGREE on the next screen.

i.  Repeat the above steps for each replica that is on the defective server.  Often, a single server may hold copies of multiple partitions.  This same server can then be used when removing the defective server from the replica rings.  The advantages to this situation are that you will only need to authenticate to the server once, and a heartbeat will only need to be forced one time once all of the replica rings have been cleaned up (see below).

j.  For each server that you used to remove the defective server from the replica ring, force a synchronization heartbeat.  This will help to guarantee that all servers in the replica ring will be notified of the change and will update their individual databases.  Normally this information will automatically be forwarded to all the necessary servers.  This is manually done with the following commands:

SET DSTRACE=+S
SET DSTRACE=*H


6.  When step 5 is completed, save the database and exit all the way out of DSREPAIR.  It does not matter how many errors were found in the repair.

7.  The defective server now needs to have accurate references to the real objects that were deleted from the server's database.  These references are verified by the backlink process, which runs by default every 13 hours on a server.  Until the process is completed, users may have trouble authenticating to the server.  Instead of waiting up to 13 hours, you can force the process by entering the following commands:

SET DSTRACE =NODEBUG
SET DSTRACE =+BLINK  (Turns DSTRACE screen ON and allows you to see the backlink process)
SET DSTRACE =*B  (Manually starts the backlink process)

8.  Toggle to the directory services screen and watch for message "Finished checking backlinks successfully" or "Finished checking backlinks succeeded." This message indicates that the defective server now holds the proper information.  Users should be able to authenticate to the server without any problems.

9.  At this point, the replicas are off the problem server and the tree should be functioning normally.  Replicas may now be added back to the server as necessary.  Occasionally, multiple servers need this procedure performed in order to clean up a tree.  Contact Novell Technical Support for help in determining which servers may need this procedure performed.  It is also advisable to run through TID #3564075 - NDS Health Check Procedures - Cross Platform, to verify that the tree is synchronizing properly.

NOTE:         In Novell Directory Services 8 (eDirectory), the -RC and -XK2 switches all need to be run separately and in the given order.  Older versions of nds will require that you use an -XK3 swich. When following this procedure, be careful to verify that there are good real copies of all replicas, contained on the problem server, on other servers as this will remove all replicas on the server without prompting the user.

.

Additional Information

Formerly known as TID# 10026822