Distributors and Subscribers stop communicating for no identifiable reason.

  • 7002528
  • 03-Feb-2009
  • 30-Apr-2012

Environment

Novell ZENworks for Servers 3.0.2.
Novell ZENworks for Servers 3.0.2 IR1
Novell ZENworks 6.5 Server Management - ZfS65.
Novell ZENworks 7 Server Management Support Pack 1 - ZSM7 SP1
Novell Tiered Electronic Distribution (TED).

Situation

Distributions stop working for no identifiable reason.
 
High utilization on extraction process on Subscribers.
 
Sockets are stuck between Distributor and Subscriber(s).
 
Restarting the server(s) resolves the 'stuck' state till the next time it happens.

Error: "Error sending distribution to <file_server_name>. Exception: Connection refused:".

Error: " *** Exception: com.sybase.jdbc.SybSQLException: ASA Error -194: No primary key value for foreign key 'FK_TAB_DIST_REF_37_TAB_DIST' in table 'TAB_DIST_VERSION' ".

Error: "Status = "Error" Message = "ERROR - applying the distribution."

Error: "Gather <Distribution_Name> stopped with error: com.novell.application.zenworks.services.cmf.FileExEntry; Local class not compatible: stream classdesc serialVersionUID=".

Error: "Stopped distribution due to signature error from distributor <Distributor_Name> or <Distributor_IP_Address>. Make sure certificates have been properly resolved."

Error : "[ZWS:Zen Process Request] java.lang.OutOfMemoryError"

Resolution

Potential solutions:
 
Unload ZFS TED/POL on servers not receiving distributions and delete the \ZENWORKS\PDS\TED\TED.CFG and reload ZFS TED/POL.  This may have to be done all the way up to the distributor, including Parent Subscribers.

Set the following properties while launching the Zenloader (ZFS.NCF; Linux: /etc/init.d/novell-zfs stop ):

sun.net.client.defaultConnectTimeout=5000 (the timeout value in milliseconds)
sun.net.client.defaultReadTimeout=5000 (the timeout value in milliseconds)
 
The timeout value may need to be higher, 300000 representing 5 minutes, was suggested for one environment.

Status

Reported to Engineering

Additional Information

Formerly known as TID# 10093442
 
Note:  It is normal for the name of the distribution in the TED.LOG  to sometimes have the Tree name (with or without a backslash) added as a prefix to the distribution name in the TED.LOG .
 
Reasons for TED.CFGs needing to be deleted repeatedly and resent by the Distributor are being investigated.
 
For identified reasons in TIDs for deleting TED.CFGs , search www.novell.com/support on the key phrase 'delete ted.cfg'.
 
If the potential solution has to be used repeatedly in a short period of time (days, weeks) and is not already covered in a TID, please contact Novell Technical Support.  Provide Technical Support copies of the TED.CFGs that have to be deleted and the new ones that work for a while.
 
Also investigate the state of TCP communications between Distributor(s) and Subscribers and if changing TCP KEEPALIVE default settings to meet the needs of the environment for uninterrupted communication on the TCP sockets.  Check for the WAN infrastructure prematurely killing the connections due to perceived inactivity.  Feedback on this TID for the results of investigation into TCP connectivity and KEEPALIVE settings is welcome.
 
The property "sun.net.client.defaultConnectTimeout" is to specify the timeout (in milliseconds) to establish the connection to the remote host.  The second property "sun.net.client.defaultReadTimeout" specifies the timeout when reading from an input stream when a connection is established.
The way you set these are as follows:
 
1. Shutdown/exit the ZFS service.
 
2. Open up the SYS:\ZENWORKS\ZFS.NCF (NetWare) or /opt/novell/zenworks/bin/zfs-pds (SLES) file.
 
3. The last line would be something like:

java -Xmx384M -envDISPLAY=127.0.0.1:0 -noclassgc -nsac -jszfsexit -snZENworks
-classpath $tedpath com.novell.application.zenworks.loader.ZENLoader SYS:\zenworks\zfs-startup.xml
 
4. Append the two properties as follows:

java -Dsun.net.client.defaultConnectTimeout=5000 -Dsun.net.client.defaultReadTimeout=5000 -Xmx384M -envDISPLAY=127.0.0.1:0 -noclassgc -nsac -jszfsexit -snZENworks -classpath $tedpath com.novell.application.zenworks.loader.ZENLoader SYS:\zenworks\zfs-startup.xml
 
(Note the space before the -D and "no space" after the -D, the syntax is the same for NetWare and SLES servers: 
-Dsun.net.client.defaultConnectTimeout=5000 -Dsun.net.client.defaultReadTimeout=5000 )
 
5. Start the ZFS service (SYS:\ZENWORKS\ZFS.NCF,
/opt/novell/zenworks/bin/zfs-pds) .
 
Note: For an additional TID covering some distributions failing with 'Connect Time Out' errors, see KB 7007210 'ZSM 7 TED McAfee Update Distribution fails on some Subscribers'.  The settings "sun.net.client.defaultConnectTimeout"  and "sun.net.client.defaultReadTimeout" set to 300000 did not affect the issue covered in that TID.