Error -785 trying to open dib after ndsd crash or hard kill

  • 3037150
  • 11-Jan-2008
  • 26-Apr-2012

Environment


Novell eDirectory 8.8 for All Platforms
Novell eDirectory 8.7.3 for All Platforms

Situation

The ndsd.log shows - error -785 unable to open database

The ndstrace with +init and +recm shows an SMI error 49302 - (FERR_BAD_DATA_LENGTH) when the agent attempts to open the dib.

To run ndstrace:
#ndstrace
>set ndstrace=nodebug
>set ndstrace=+init
>set ndstrace=+recm
>set ttf=on

To attempt to open dib:

Either have a Novell Support Engineer run ndsdump and choose to open the agent or attempt to run ndsrepair -R which will try to open the agent.

Resolution

In this particular case, the problem is caused by a transaction in the roll forward log that has an invalid data length.
The cause of the invalid data length is due to the entry size of the object that is being modified.

In this case, a very large number of network addresses caused the entry of an object to be very large. Network address is one of the few attributes that are stored on the entry record in the database.

A fix has been added to eDirectory 8.7.3.9 ftf2 and also to eDirectory 8.8.2 so that the data length of an entry will be correctly calculated even if the entry is very large.

The patch will not resolve the problem once the transaction in the roll forward log is already damaged. The server will have to be removed and readded to the tree.

The patch will prevent the transaction from having an invalid data offset.


Additional Information

If this problem is suspected you can do the following to identify the object.

Get a copy of the damaged dib
Stage the dib on a lab server with the same version of eDirectory as on the customer's server
Remove the NDSserv package and install the debug version of the NDSserv package
Start ndsd
Check the stack of the core file that will be produced.
EX:
gdb /usr/sbin/ndsd /var/nds/dib/core | tee /var/nds/dib/core.log
#where
#quit
Open the core.log file and look at the third field in the FlmRecordModify function. This will be the EID.

Use the flaim utililty view to modify the header of the nds.db file so that the dib can be opened in ndsdump.

Find the EID and check to see if it has a large number of ACL, inherited ACL or Network address attributes. Large # would be> 15,000.

It may be easier to find the object and then use iMonitor against another server to get the # of values for the attributes.