"UNDERRUN status detected 0x15-0x800" warnings in messages file after applying current patches.

  • 7009720
  • 10-Nov-2011
  • 10-Aug-2012

Environment

Novell Open Enterprise Server 2 (OES 2) Linux Support Pack 3
SUSE Linux Enterprise Server 10 Service Pack 3
SUSE Linux Enterprise Server 10 Service Pack 4

Situation

/var/log/messages contain such as this:
kernel: scsi(0:0:1) UNDERRUN status detected 0x15-0x800. resid=0x5c fw_resid=0x0 cdb=0xa3 os_underflow=0x0
These repeat and are causing concern.  It seems to have started after applying the current patches


Resolution

There's two types of UNDERRUNs you'll typically see, ones where the sender/receiver agree on the residual amount, are are perfectly valid:

    Sep 29 23:32:40 hkda2ls0005 kernel: scsi(2:0:5) UNDERRUN status detected
0x15-0x800. resid=0x46 fw_resid=0x46 cdb=0x12 os_underflow=0x0
    Sep 29 23:32:40 hkda2ls0005 kernel: scsi(2:0:5) UNDERRUN status detected
0x15-0x800. resid=0x2e fw_resid=0x2e cdb=0x12 os_underflow=0x0
    Sep 29 23:32:40 hkda2ls0005 kernel: scsi(2:0:57) UNDERRUN status detected
0x15-0x800. resid=0x58 fw_resid=0x58 cdb=0x25 os_underflow=0x0
    Sep 29 23:32:40 hkda2ls0005 kernel: scsi(2:0:57) UNDERRUN status detected
0x15-0x800. resid=0x56 fw_resid=0x56 cdb=0x12 os_underflow=0x0

and others where the residuals don't match, where the command is returned with a 'failed, yet retry-able' status:

    Sep  9 11:35:44 hkda2ls0005 kernel: scsi(3:0:93) UNDERRUN status detected
0x15-0x0. resid=0x0 fw_resid=0x73000 cdb=0x28 os_underflow=0x80000
    Sep  9 11:35:44 hkda2ls0005 kernel: scsi(3:0:0:93) Dropped frame(s)
detected (73000 of 80000 bytes)...retrying command.
    Sep  9 11:35:44 hkda2ls0005 kernel: sd 3:0:0:93: SCSI error: return code =
0x00070000
    Sep  9 11:35:44 hkda2ls0005 kernel: end_request: I/O error, dev sdci,
sector 15659008
    Sep  9 11:35:44 hkda2ls0005 kernel: device-mapper: multipath: Failing path
69:96.

These can happen for a variety of reasons, mostly hardware/cabling problem related.
These warnings have been introduced by

patches.drivers/qla2xxx-properly-handle-UNDERRUN-completion-status

in the latest kernel version, and haven't been seen by earlier versions.

They are only visible if ql2xextended_error_logging is active, and it provides useful information in case customer needs to debug the driver. By default, that
option is enabled.

As long as there is no hardware/cabling problem, you can remove the extended logging module option from qla2xxx. Any substantial failure will still be logged, so operations won't be affected.

You can disable it permanently doing the below procedure: 1. Create new file in /etc/modprobe.d/ directory using the following command: # vi /etc/modprobe.d/qlogic 2. To disable extended logging, use # to comment out the following line: #options qla2xxx ql2xextended_error_logging=1 3. Restart the host.
and the following one without having to restart the server:

echo 0 > /sys/module/qla2xxx/parameters/ql2xextended_error_logging
This last procedure is not permanent so, if you reboot the server, these qlogic warnings will appear again. Use the first procedure to make this change effective after every server restart.