eDirectory performance drops using AWS or Azure cloud managed storage also -625 -626 errors

  • 7024713
  • 07-Jul-2020
  • 29-Jul-2020

Environment

eDirectory 9.2

Situation

An eDirectory instance has been configured on the Amazon AWS or Microsoft Azure cloud platform.  Once completed,  LDAP and other services are noticeable slower compared with its previous VM or bare-metal platform. 

Errors -625 and -626 are seen in ndstrace and ndsrepair during synchronization due to NCP timing out.


Resolution

Cloud providers offer a number of options when choosing virtual storage and this is a very important choice.  eDirectory is extremely dependent on I/O: more so than memory and processors.  eDirectory's performance when running on a cloud-based disk depends not only on the type but the size of the disk that is configured.  Cloud providers limit the the number of IOPS based on not only the type of disk but, more importantly, its size.

Various tests were performed using different disk configurations on both Azure and AWS.  The results are listed below.  Overall, the thresholds on an AWS instance was not as strict as one based on Azure. 

Cause

An overall dip of 7-8% may be seen using a cloud provider but the disk selection is crucial.  The dd command was used to write out to disk, bypassing the disk cache and reporting back on a disk's maximum MB/s.  eDirectory requires a minimum of 100 MB/s.  Some lower end drive selections were found to drop below 10MB/s at times.


AWS

AWS recommends the use of EBS optimized EC2 instances to get the maximum out of the configured disks.
The links below give a good understanding on how to obtain optimal disk performance through Instance/Volume selection.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html?icmpid=docs_ec2_console
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html

Instance type - General Purpose, t2.micro 1 vCPU, 1GB Memory (Non-EBS Optimized, Free instance)
Disk - Provisioned IOPS SSD, 1000 GB, IOPS - 32000
# dd if=/dev/zero of=/home/ec2-user/iotest.log bs=64k count=8k conv=fdatasync
8192+0 records in
8192+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 7.43317 s, 72.2 MB/s

Instance type - General Purpose, t3.medium 2vCPU, 4GB Memory
Disk - Provisioned IOPS SSD, 1000 GB, IOPS - 32000
# dd if=/dev/zero of=/home/ec2-user/iotest.log bs=64k count=8k conv=fdatasync
8192+0 records in
8192+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 1.22511 s, 438 MB/s

Instance type - Memory Optimized & EBS optimized, r5ad.2xlarge 8 vCPU, 64 GB Memory
Disk - Provisioned IOPS SSD, 1000 GB, IOPS - 32000
# dd if=/dev/zero of=/home/ec2-user/iotest.log bs=64k count=8k conv=fdatasync
8192+0 records in
8192+0 records out
536870912 bytes (537 MB, 512 MiB) copied, 0.684601 s, 784 MB/s

Even a medium level instance of this instance category gave decent throughput.  However, it also shows how selecting the low end disk will result in problems.


AZURE

Azure has around 14 Premium SSD Managed disks. We highly recommend a Premium SSD Managed disk over a Standard SSD.  However, it is up to a customer's environment to dictate a minimum required disk type. While throughput and IOPs can be increased by just increasing the disk size there a corresponding increase in cost as well.  The bottom line is that disk performance was found to be not uniform across disk sizes.  We also sometimes found that the published numbers may not match what is found.

As a benchmark, using the same command, it was observed that a 'normal' VM was giving 500MB/s (using a local available host).  There are too many permutations available to report on every configuration available and the types are documented on their site. 
https://azure.microsoft.com/en-au/pricing/details/managed-disks/

It is hoped these numbers can server as an indicator when customers choose their drive type.  Specifically, both the type of disk is important but also is its size.

- Azure P30 disk (1TB)
Listed as 200MB/s but was returning less than 50MB/s.

- Azure P60 disk (1TB)
Listed as 500MB/s but returned 35 MB/s

- Final Maximum Configuration:

VM details:
Memory Optimized - Standard_DS13_v2 (8 vcpus, 56 GiB memory)
Max cached and temp storage throughput: IOPS/MBps  (32000/256)
Max uncached disk throughput: IOPS/MBps  (25600/384)

Disk details:
P60 Data Disk ( 8 TB, Premium SSD )
Listed as 500 MB/s

Our final results using the dd command showed this configuration was returning 304-297 MB/s.  Simply increasing the size makes an enourmous difference.  When testing ensure before running the dd command that eDirectory is up and busy as results can be very load dependant.  In the final configuration there was only a drop from 304 to 297 MB/s when running a continuous LDAP thread.