Getting stale NFS file handle errors after cluster fail over

  • 3714483
  • 21-Sep-2006
  • 08-Nov-2012

Environment

Novell SUSE Linux Enterprise Server 10
High Availability Release 2 Cluster

Situation

Error: "Cannot open: Stale NFS file handle"
An NFS resource in a High Availability (HA) cluster fails after it has been migrated to another node

m20:~ # crm_mon -1
============
Last updated: Thu Sep 21 13:04:03 2006
Current DC: m20 (f56c650f-1047-453c-907c-859e9c6cb598)
2 Nodes configured.
3 Resources configured.
============

Node: m20 (f56c650f-1047-453c-907c-859e9c6cb598): online
Node: m12 (323e7b1b-1545-4e26-a9a5-5f8e65871752): online

Resource Group: NFS
m55_ip (heartbeat::ocf:IPaddr): Started m20
nfsvg_lvm (heartbeat::ocf:LVM): Started m20
nfs_share_fs (heartbeat::ocf:Filesystem): Started m20
nfsserver (lsb:nfsserver): Started m20


Resolution

Export the directories with the fsid tag to specify a specific file system identification number. On each node that will host the NFSresource, modify the /etc/exports file, and add the fsid option to each export. For example, change from this:

/exports/data *(rw,root_squash,sync)

to this:

/exports/data *(rw,root_squash,sync,fsid=25)

The /etc/exports file should be the same on each node hosting the NFS resource.

Additional Information

The device major/minor numbers are embedded in the NFS file handle. This creates a problem when an NFS export is failed over or moved to another node in a cluster scenario. The major/minor numbers change when the resource is exported on the new node. This causes the client to see a "Stale NFS file handle" error. We need to make sure the embedded number stays the same. The fsid export option allows us to specify a number instead of using the major/minor numbers in the file handle.

WARNING: You need to make sure the fsid number is unique across all exported file systems.

For additional information, see:
http://linux-ha.org/HaNFS
exports(5)