race between xfs_zero_eof/direct write can cause corruption

  • 7017183
  • 22-Jan-2016
  • 25-Jan-2016

Environment

SUSE Linux Enterprise Server 11 Service Pack 3 (SLES 11 SP3)
SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)
SUSE Linux Enterprise Server 12
SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP1)

Situation

Under certain conditions, a file on an XFS file system is overwritten, the newly written blocks contain only zeroes.
This happens when the file is written via Direct I/O.
There are no errors, even successful writes are reported.
Data corruption had also been seen on kvm guests with qcow2 virtual disks on XFS. The virtual disks were configured with aio=native and not extended to their maximum file size.

Resolution

The following kernel updates include the patch to resolve the problem:

SLES11 SP3   3.0.101-0.47.71.1  released November 2015
SLES11 SP4   3.0.101-68.1          released December 2015
SLES12          3.12.51-52.31.1     released December 2015
SLES12 SP1   3.12.51-60.20.2     released December 2015

It is recommended to update the kernel to avoid the risk of data corruption.

Cause

During a write or overwrite there is a lost direct IO write in the last block.

The problem happens when direct IO, smaller than the fs block size, is issued into the last file block (partial) and at the same time direct IO starting beyond EOF (end of file) is issued as well.
The zeroing of the last partial block can race with the direct IO into the partial block and thus result in lost direct IO write.

Feedback service temporarily unavailable. For content questions or problems, please contact Support.