Faulty Intel chipsets cause problems with interrupt remapping

  • 7014344
  • 20-Dec-2013
  • 20-Aug-2014

Environment

SUSE Linux Enterprise Server 11 Service Pack 2
SUSE Linux Enterprise Server 11 Service Pack 3

Situation

On systems with the Intel 5500 and 5520 chipsets (revision 0x13) and the Intel X58 chipset (revisions 0x12, 0x13, 0x22), having interrupt remapping enabled causes various problems.
The reported symptoms range from network link state flapping and partial to full loss of communication on network cards.

Common is that the kernel will log messages like
"kernel: do_IRQ: x.xxx No irq handler for vector (irq -1)"
in the syslog.

Resolution

Interrupt remapping should be disabled on systems with the above mentioned chipsets.

Intel has provided firmware updates/errata to the BIOS of the affected chipsets.
Examples :
Intel® 5520 and Intel® 5500 Chipset Specification Update
Erratas
47. Intel VT-d: Receiving two identical interrupt requests in back to back
    cycles may corrupt attributes of remapped interrupt, or hang
    subsequent interrupt-remap-cache invalidation command.
and
53. Intel VT-d: In-flight remap-able interrupts not drained on interrupt
    invalidation command

Intel® X58 Express Chipset Specification Update
Erratas
62. Intel VT-d: UP Workstation ONLY. Receiving two identical interrupt
    requests in back to back cycles may corrupt attributes of remapped
    interrupt, or hang subsequent interrupt-remap-cache invalidation
    command.
69. Intel® VT-d: In-flight remap-able interrupts not drained on interrupt
    invalidation command

In some deployments however, updating firmware on field systems may have implications.
To help customers in this situation, a quirk in the upstream linux kernel has recently been introduced.

When this code detects that the system matches the hardware specifications and interrupt mapping is enabled, it  disables interrupt mapping in the kernel. Also the kernel is tainted and the message below is logged in /var/log/messages and dmesg.

This system BIOS has enabled interrupt remapping
on a chipset that contains an erratum making that
feature unstable.  To maintain system stability
interrupt remapping is being disabled.  Please
contact your BIOS vendor for an update.


This change is included in the SLE 11 SP3 linux kernel version 3.0.101-0.21.1. The SLE 11 SP2 (LTSS) kernel version 3.0.101-0.7.19.1 includes a lightweight version of the same, where the kernel gets tainted, the warning is printed, but interrupt remapping is not actually disabled.

If a system is exhibiting the symptoms described above, it is recommended to first determine whether the system is equipped with one of the faulty chipsets.
This can be done with the following command :
# /sbin/lspci -nn | grep -qE '8086:(340[36].*rev 13|3405.*rev (12|13|22))' && echo "Interrupt remapping is broken"
In case it outputs "Interrupt remapping is broken", continue below. Otherwise this document does not apply.

The quirk introduced in the linux kernel is merely a workaround to handle broken hardware.

Contact the hardware vendor and request a firmware update that addresses the problem in order to get the root cause fixed.

If a firmware update fix is not an option install the kernel version mentioned above (or later) on SLE 11 SP3 systems. For SLE 11 SP2 there will not be a kernel workaround.
On some systems it is also possible to disable interrupt remapping in the BIOS. Interrupt remapping however goes under different names. An example would be "Intel VT-d".

As a temporary workaround, the interrupt remapping can be disabled by adding
intremap=off
to the list of kernel commandline parameters in the boot loader configuration using the YaST bootloader module.

Interrupt remapping is mostly useful when using PCI pass-through in KVM-based virtualization scenarios. Customers not using KVM, or using KVM without PCI pass-through, can just disable interrupt remapping and be done with it. However, customers using PCI pass-through in KVM will additionally have to update their KVM configuration to no longer make use of PCI pass-through. Failing to update the KVM configuration would cause KVM guests to no longer come up.

Cause

This is due to malfunctional firmware which flags to the operating system that it supports interrupt remapping despite the fact that is unable to handle it.

Feedback service temporarily unavailable. For content questions or problems, please contact Support.