Environment
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 2
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Novell Open Enterprise Server 11 (OES 11) Linux
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 1
Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
Novell Open Enterprise Server 11 (OES 11) Linux
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 1
Situation
In a number of environments customers have encountered a number of odd issues, varying from spontaneous occurring overall 'network' slowness, irregularly encountering large delays for user authentication, sudden delays on user login and/or logout and writing user profiles to the server, servers very slow in responding, etc. When the machines were rebooted, the problems were gone.
In most of these cases, it was determined that these symptoms mainly showed on servers with large numbers of users, large numbers of eDir replica's, etc. Basically servers with high volume eDirectory traffic were affected by these problems.
When analyzing eDirectory as initial suspect, it was found the NDS traces showed the NCP engine was responding with BUSY.
In most of these cases, it was determined that these symptoms mainly showed on servers with large numbers of users, large numbers of eDir replica's, etc. Basically servers with high volume eDirectory traffic were affected by these problems.
When analyzing eDirectory as initial suspect, it was found the NDS traces showed the NCP engine was responding with BUSY.
Resolution
A modified version of the NCP engine was made that allows for a larger (and configurable) number of concurrent NCP threads, and the NCP console (ncpcon) was enhanced with a 'threads' command to be able to verify the current NCP utilization.
This new functionality has been released to the public with the following updates :
Example given:
HP-D-7657:~ # ncpcon threads
... Executing " threads"
2010-12-06 07:56:56 - NCP THREAD STATISTICS
Async (eDir) Threads and Requests Statistics:-
---------------------------------------------
Number of Running Threads : 0
Max Thread Size : 50
Thread Peak Size : 50
Number of Queued Requests : 0
Queued Requests Peak Size : 111
NCP SSG Thread Statistics:-
--------------------------
Total Number of Active SSG Threads : 25
Total Number of Additional SSG Threads : 25
(over & above fixed 25 NCP Threads)
Total Number of Active NCP Streams : 2484
Current Average Load per Thread : 99.36
Peak Number of Active SSG Threads : 40
Peak Number of NCP Streams : 2641
Peak Average Load per Thread : 105.64
... completed OK [elapsed time = 886 usecs]
For an explanation of what is found under the section: "Async (eDir) Threads and Requests Statistics"
Async threads would consist of
- Number of Running Threads : 0
The currently running number of Async threads that can handle eDir Requests.
- Max Thread Size : 15
The max number of the Async threads that can be created.
(15 = default, can be changed from anywhere between 16 and 256)
This value can be modified using the "ncpcon set CONCURRENT_ASYNC_REQUESTS=xx" command.
- Thread Peak Size : 1
The highest number (peak) of Async threads the server required so far.
(This number is reset when the service is restarted.)
- Number of Queued Requests : 0
The number of queued eDir requests (after the Async threads got exhausted).
- Queued Requests Peak Size : 0
The highest number (peak) of eDir requests that have been queued so far (after the Async threads got exhausted).
For an explanation of what is found under the section: "NCP SSG Thread Statistics"
SSG threads are about the 'pure' file service NCP requests which were previously limited to a 25 max (and additional 7 threads hard coded).
- Total Number of Active SSG Threads : 18
The currently active SSG threads.
- Max Number of Additional SSG Threads : 7
(over and above fixed 25 NCP Threads)
When all 25 fixed SSG threads are exhausted, then this number defines the additional number of threads that can be created to serve other incoming NCP file service requests. These will only be used when the fixed 25 threads are busy and taking more than expected time to finish.
(7 = default additional threads, the maximum number of SSG threads is also limited to 128, so with 25 fixed threads, the value can be set to anything between 1 and 103)
This value can be modified using the "ncpcon set ADDITIONAL_SSG_THREADS=xx" command.
- Total Number of Active NCP Streams : 24
The current number of NCP connections that have been handled by the Active SSG threads.
- Current Average Load per Thread : 1.00
The current average load of NCP connections on the Active SSG threads.
- Peak Number of Active SSG Threads : 18
The highest number (peak) of the Active SSG threads.
- Peak Number of NCP Streams : 24
The highest number (peak) of NCP streams.
- Peak Average Load per Thread : 2.00
The highest number (peak) of streams per thread.
This new functionality has been released to the public with the following updates :
- for OES2 FCS & OES2 SP1, this functionality has not been made available
- for OES2 SP2, this was released January 2011 Scheduled Maintenance release :
January 2011 Scheduled Maintenance for OES2 SP2 20110131 x86
January 2011 Scheduled Maintenance for OES2 SP2 20110131 x86-64
January 2011 Scheduled Maintenance for OES2 SP2 20110131 x86-64
- for OES2 SP3, this was released in the April 2011 Scheduled Maintenance release :
April 2011 Scheduled Maintenance for OES SP3 20110421 x86
April 2011 Scheduled Maintenance for OES SP3 20110421 x86-64
April 2011 Scheduled Maintenance for OES SP3 20110421 x86-64
- for OES11 and later this functionality has been available since it's release.
Example given:
HP-D-7657:~ # ncpcon threads
... Executing " threads"
2010-12-06 07:56:56 - NCP THREAD STATISTICS
Async (eDir) Threads and Requests Statistics:-
---------------------------------------------
Number of Running Threads : 0
Max Thread Size : 50
Thread Peak Size : 50
Number of Queued Requests : 0
Queued Requests Peak Size : 111
NCP SSG Thread Statistics:-
--------------------------
Total Number of Active SSG Threads : 25
Total Number of Additional SSG Threads : 25
(over & above fixed 25 NCP Threads)
Total Number of Active NCP Streams : 2484
Current Average Load per Thread : 99.36
Peak Number of Active SSG Threads : 40
Peak Number of NCP Streams : 2641
Peak Average Load per Thread : 105.64
... completed OK [elapsed time = 886 usecs]
For an explanation of what is found under the section: "Async (eDir) Threads and Requests Statistics"
Async threads would consist of
- Number of Running Threads : 0
The currently running number of Async threads that can handle eDir Requests.
- Max Thread Size : 15
The max number of the Async threads that can be created.
(15 = default, can be changed from anywhere between 16 and 256)
This value can be modified using the "ncpcon set CONCURRENT_ASYNC_REQUESTS=xx" command.
- Thread Peak Size : 1
The highest number (peak) of Async threads the server required so far.
(This number is reset when the service is restarted.)
- Number of Queued Requests : 0
The number of queued eDir requests (after the Async threads got exhausted).
- Queued Requests Peak Size : 0
The highest number (peak) of eDir requests that have been queued so far (after the Async threads got exhausted).
For an explanation of what is found under the section: "NCP SSG Thread Statistics"
SSG threads are about the 'pure' file service NCP requests which were previously limited to a 25 max (and additional 7 threads hard coded).
- Total Number of Active SSG Threads : 18
The currently active SSG threads.
- Max Number of Additional SSG Threads : 7
(over and above fixed 25 NCP Threads)
When all 25 fixed SSG threads are exhausted, then this number defines the additional number of threads that can be created to serve other incoming NCP file service requests. These will only be used when the fixed 25 threads are busy and taking more than expected time to finish.
(7 = default additional threads, the maximum number of SSG threads is also limited to 128, so with 25 fixed threads, the value can be set to anything between 1 and 103)
This value can be modified using the "ncpcon set ADDITIONAL_SSG_THREADS=xx" command.
- Total Number of Active NCP Streams : 24
The current number of NCP connections that have been handled by the Active SSG threads.
- Current Average Load per Thread : 1.00
The current average load of NCP connections on the Active SSG threads.
- Peak Number of Active SSG Threads : 18
The highest number (peak) of the Active SSG threads.
- Peak Number of NCP Streams : 24
The highest number (peak) of NCP streams.
- Peak Average Load per Thread : 2.00
The highest number (peak) of streams per thread.
Additional Information
To prevent all available NCP threads from being consumed by eDirectory requests (this way preventing the server from performing any other necessary tasks), the NCP engine was previously limited to run a maximum of 16 concurrent eDir NCP requests, and subsequent new threads would be queued up until the previous threads were completed.
Properly analyzing the currently in-use NCP threads on OES Linux is only possible when using ndstrace (using ndstrace ncpe and ndstrace file on)
Output as below.:
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4283 at tcp:10.201.105.115:1043, succeeded.
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4328 at tcp:10.201.104.164:1049, succeeded.
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4334 at tcp:10.201.101.159:1034, succeeded.
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4336 at tcp:10.201.101.159:1036, succeeded.
1304963392 NCPE: [2010/12/01 16:08:54.6] NCP Reply (BUSY) to tcp:10.58.0.21 - conn 514, seq 6, size 16, ccode 0
1327069504 NCPE: [2010/12/01 16:08:54.316] NCP Reply (BUSY) to tcp:10.18.96.21 - conn 866, seq 2, size 16, ccode 0
1096870208 NCPE: [2010/12/01 16:08:54.546] NCP Reply (BUSY) to tcp:10.57.76.20 - conn 947, seq 2, size 16, ccode 0
1098975552 NCPE: [2010/12/01 16:08:54.559] NCP Reply (BUSY) to tcp:10.17.0.25 - conn 518, seq 3, size 16, ccode 0
1097922880 NCPE: [2010/12/01 16:08:54.809] NCP Reply (BUSY) to tcp:10.213.64.20 - conn 724, seq 3, size 16, ccode 0
1302858048 NCPE: [2010/12/01 16:08:55.113] NCP Reply (BUSY) to tcp:10.180.0.70 - conn 311, seq 2, size 16, ccode 0
1304963392 NCPE: [2010/12/01 16:08:55.137] NCP Reply (BUSY) to tcp:10.10.16.20 - conn 1192, seq 2, size 16, ccode 0
1302858048 NCPE: [2010/12/01 16:08:55.228] NCP Reply (BUSY) to tcp:10.56.12.20 - conn 149, seq 5, size 16, ccode 0
1314437440 NCPE: [2010/12/01 16:08:55.236] NCP Reply (BUSY) to tcp:10.57.120.20 - conn 257, seq 2, size 16, ccode 0
1306016064 NCPE: [2010/12/01 16:08:55.344] NCP Reply (BUSY) to tcp:10.59.1.21 - conn 13, seq 3, size 16, ccode 0
During troubleshooting customer issues, analyzing the ndstrace log files and gstack output, it was concluded there were exactly 16 threads in AsyncNCPThreads doing DSAResolveName. The reason for only very few threads hanging at DSAResolveName was the previous 16 thread restriction that existed in the NCP engine.
After these 16 threads are spawned by the NCP engine and all threads are busy, any additional DS requests coming in would be put into a queue, and the second time the request comes in, the NCP engine will respond with BUSY.
As in forthcoming scenario's, the previous maximum number of NCP threads may no be longer sufficient, hence the NCP code was modified to allow for larger and configurable number of threads to be able and properly process the higher workload.
Properly analyzing the currently in-use NCP threads on OES Linux is only possible when using ndstrace (using ndstrace ncpe and ndstrace file on)
Output as below.:
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4283 at tcp:10.201.105.115:1043, succeeded.
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4328 at tcp:10.201.104.164:1049, succeeded.
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4334 at tcp:10.201.101.159:1034, succeeded.
1202874688 NCPE: [2010/12/01 16:08:53.913] Broadcast <!> send to conn 4336 at tcp:10.201.101.159:1036, succeeded.
1304963392 NCPE: [2010/12/01 16:08:54.6] NCP Reply (BUSY) to tcp:10.58.0.21 - conn 514, seq 6, size 16, ccode 0
1327069504 NCPE: [2010/12/01 16:08:54.316] NCP Reply (BUSY) to tcp:10.18.96.21 - conn 866, seq 2, size 16, ccode 0
1096870208 NCPE: [2010/12/01 16:08:54.546] NCP Reply (BUSY) to tcp:10.57.76.20 - conn 947, seq 2, size 16, ccode 0
1098975552 NCPE: [2010/12/01 16:08:54.559] NCP Reply (BUSY) to tcp:10.17.0.25 - conn 518, seq 3, size 16, ccode 0
1097922880 NCPE: [2010/12/01 16:08:54.809] NCP Reply (BUSY) to tcp:10.213.64.20 - conn 724, seq 3, size 16, ccode 0
1302858048 NCPE: [2010/12/01 16:08:55.113] NCP Reply (BUSY) to tcp:10.180.0.70 - conn 311, seq 2, size 16, ccode 0
1304963392 NCPE: [2010/12/01 16:08:55.137] NCP Reply (BUSY) to tcp:10.10.16.20 - conn 1192, seq 2, size 16, ccode 0
1302858048 NCPE: [2010/12/01 16:08:55.228] NCP Reply (BUSY) to tcp:10.56.12.20 - conn 149, seq 5, size 16, ccode 0
1314437440 NCPE: [2010/12/01 16:08:55.236] NCP Reply (BUSY) to tcp:10.57.120.20 - conn 257, seq 2, size 16, ccode 0
1306016064 NCPE: [2010/12/01 16:08:55.344] NCP Reply (BUSY) to tcp:10.59.1.21 - conn 13, seq 3, size 16, ccode 0
During troubleshooting customer issues, analyzing the ndstrace log files and gstack output, it was concluded there were exactly 16 threads in AsyncNCPThreads doing DSAResolveName. The reason for only very few threads hanging at DSAResolveName was the previous 16 thread restriction that existed in the NCP engine.
After these 16 threads are spawned by the NCP engine and all threads are busy, any additional DS requests coming in would be put into a queue, and the second time the request comes in, the NCP engine will respond with BUSY.
As in forthcoming scenario's, the previous maximum number of NCP threads may no be longer sufficient, hence the NCP code was modified to allow for larger and configurable number of threads to be able and properly process the higher workload.