Low Disk Performance with high IO stalls system

  • 7023297
  • 23-Aug-2018
  • 23-Aug-2018

Environment

SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 11

Situation

Some applications, for example a Database, might do a lot of I/O and it might seem from an administrator point of view, that this I/O stalls the system. Also there might be a lot more kworker processes. This might be aggravated by using Softraid, maybe as host based mirror, on top of the devices. This stall can get to the point where it triggers failures in a cluster in the form of failed monitor operations due to system stress.

This could be an indication that there is

    barrier
   
set on the Filesystem even so the underlying Device does not have a write cache.

Resolution

What happens is that the filesystem with barriers enabled issues flush requests to all intermediate layers only to get discarded by the SCSI Disks and this slows down the system leading to the observed performance issue.

To alleviate this issue it is recommended to mount the relevant filesystems with

    nobarrier
   
as mount option.

Extreme care should be taken to ensure that the device connected to this Filesystem really has a volatile cache or not.

If it does then setting

    nobarrier
   
can result in data loss. Please never set nobarrier on a Filesystem on a device with cache enabled!

To identify whether the device has a cache or not, one can check

   dmesg
   
and check for "cache" like

       dmesg | grep cache
      
and the result might look like

[    3.685928] sd 0:2:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[    5.140281] sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

as can be seen the device identified as

    sda
   
reports the Write cache as disabled, so a Filesystem associated with this device can use

    nobarrier
   
the opposite in this example is the device identified as

    sdb
   
this device reports the Write cache enabled, so no Filesystem associated with this device should
have the barriers removed to prevent data loss.

This can also be checked in the running system with the tool

    sdparm
   
On the same system as in the example above the output of sdparm reads

belphegore:~ # sdparm --get=WCE=1 /dev/sda
    /dev/sda: DELL      PERC H730 Mini    4.27
WCE         0

which means Write Cache disabled for sda, nobarrier possible

belphegore:~ # sdparm --get=WCE=1 /dev/sdb
    /dev/sdb: IFT       DS 1000 Series    555Q
WCE         1

which means Write Cache enabled for sdb, barrier necessary

Cause

The Kernel cannot determine the best setting on the device itself at the moment

Feedback service temporarily unavailable. For content questions or problems, please contact Support.