BTRFS filesystem going readonly on balance operation.

  • 7018233
  • 03-Nov-2016
  • 24-Oct-2018

Environment

SUSE Linux Enterprise Server 12 Service Pack 2 (SLES 12 SP2)

Situation

A BTRFS filesystem is going ReadOnly while throwing the following messages (example):

 ------------[ cut here ]------------
 BTRFS: error (device vda3) in __btrfs_cow_block:1163: errno=-2 No such entry BTRFS info (device vda3): forced readonly BTRFS error (device vda3): cleaner transaction attach returned -30 BTRFS warning (device vda3): page private not zero on page 76709888 BTRFS warning (device vda3): page private not zero on page 76713984 BTRFS warning (device vda3): page private not zero on page 76718080
 ------------[ cut here ]------------


At the same time a kernel WARNING appears in the logs :
 ------------[ cut here ]------------
 WARNING: CPU: 0 PID: 1596 at ../fs/btrfs/ctree.c:1163 __btrfs_cow_block+0x3d8/0x5c0 [btrfs]()
 BTRFS: Transaction aborted (error -2)
 Supported: Yes
 CPU: 0 PID: 1596 Comm: btrfs Not tainted 4.4.21-68-default #1
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014
  0000000000000000 ffffffff8130d890 ffff88003364b868 ffffffffa03a264f
  ffffffff8107c211 ffff88003c169d70 ffff88003364b8b8 ffff88003c348020
  ffff88003b90ab00 ffff88003655c800 ffffffff8107c28c ffffffffa03a51e8
 Call Trace:
  [<ffffffff81019a59>] dump_trace+0x59/0x310
  [<ffffffff81019dfa>] show_stack_log_lvl+0xea/0x170
  [<ffffffff8101ab81>] show_stack+0x21/0x40
  [<ffffffff8130d890>] dump_stack+0x5c/0x7c
  [<ffffffff8107c211>] warn_slowpath_common+0x81/0xb0
  [<ffffffff8107c28c>] warn_slowpath_fmt+0x4c/0x50
  [<ffffffffa02fbb48>] __btrfs_cow_block+0x3d8/0x5c0 [btrfs]
  [<ffffffffa02fbebf>] btrfs_cow_block+0x10f/0x1d0 [btrfs]
  [<ffffffffa0374393>] do_relocation+0x3f3/0x4d0 [btrfs]
  [<ffffffffa0376bf0>] relocate_tree_blocks+0x590/0x5c0 [btrfs]
  [<ffffffffa0378e3f>] relocate_block_group+0x52f/0x900 [btrfs]
  [<ffffffffa03793aa>] btrfs_relocate_block_group+0x19a/0x290 [btrfs]
  [<ffffffffa034c5ea>] btrfs_relocate_chunk.isra.37+0x4a/0xe0 [btrfs]
  [<ffffffffa034d18b>] btrfs_shrink_device+0x19b/0x530 [btrfs]
  [<ffffffffa034d602>] __btrfs_balance+0xe2/0xb80 [btrfs]
  [<ffffffffa034e380>] btrfs_balance+0x2e0/0x600 [btrfs]
  [<ffffffffa035831c>] btrfs_ioctl_balance+0x3ec/0x520 [btrfs]
  [<ffffffffa035c442>] btrfs_ioctl+0x562/0x2460 [btrfs]
  [<ffffffff8120cd6d>] do_vfs_ioctl+0x2cd/0x4a0
  [<ffffffff8120cfb4>] SyS_ioctl+0x74/0x80
  [<ffffffff815e142e>] entry_SYSCALL_64_fastpath+0x12/0x6d
 DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x12/0x6d
------------[ cut here ]------------


Resolution

Workaround :
As an immediate workaround, disable balancing on that filesystem :
mount -o remount,rw,skip_balance <mountpoint>

Solution :
The issue itself was already fixed on SLES12 SP2 via Linux Kernel 87 kernel (version 4.4.38-93.1)  :

* btrfs: fix endless loop in balancing block groups (bsc#1006804).

 

Cause

Background information :
This issue can be triggered _only_ with a file system where there is almost no free space available anymore.

Under rare conditions balancing the file system tries to relocate an extend which fails due to missing space.  The error is unconditionally overwritten and success returned. The presumably relocated extent is missing later because it wasn't actually relocated. This results in remounting the filesystem as 'read-only'.

It should be noted that none of the file systems affected by this are suffering corruption.
This is entirely a runtime issue and no repair is required.

User action as noted above *is* required to avoid hitting the issue immediately on remount once the issue has been encountered though.

Feedback service temporarily unavailable. For content questions or problems, please contact Support.