Cannot remount a datastore after an unplanned PDL

Symptoms

After a storage device has unexpectedly unpresented from the storage array, you are unable to mount it again

There was a running virtual machine when storage device went offline

ESXi 5.0 host cannot mount the storage after the LUN is online again

In the vmkernel log file, you see entries similar to:

2012-02-13T22:47:44.243Z cpu36:5590)Vol3: 1665: Error refreshing FD resMeta: Device is permanently unavailable
2012-02-13T22:47:44.281Z cpu34:5590)VC: 1449: Device rescan time 165 msec (total number of devices 75)
2012-02-13T22:47:44.281Z cpu34:5590)VC: 1452: Filesystem probe time 504 msec (devices probed 48 of 75)
2012-02-13T22:47:44.406Z cpu38:5590)ScsiDevice: 4592: naa.6006016058201700354179be0c6fdf11 device :Open count > 0, cannot be brought online
2012-02-13T22:47:44.654Z cpu34:5590)Vol3: 647: Couldn't read volume header from control: Invalid handle
2012-02-13T22:47:44.654Z cpu34:5590)FSS: 4333: No FS driver claimed device 'control': Not supported
2012-02-13T22:47:45.008Z cpu38:5590)ScsiDeviceIO: 2316: Cmd(0x4124c0ea2e80) 0x28, CmdSN 0x70509 to dev "naa.6006016058201700354179be0c6fdf11" failed H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Cause

This is an expected behaviour because the I/O on LUNs did not terminate gracefully. To properly remove a datastore, seeUnpresenting a LUN in ESXi 5.x (2004605).

Resolution

To resolve this issue:

Run this command to see the world that has the device open for the LUN:

#esxcli storage core device world list -d naa-id
For example:

#esxcli storage core device world list -d naa.6006016058201700354179be0c6fdf11
You see output similar to:

Device World ID Open Count World Name
------------------------------------ -------- ---------- ----------
naa.6006016058201700354179be0c6fdf11 2060 1 idle0
If a VMFS volume is using the device indirectly, the world name includes the string idle0. If a virtual machine uses the device as an RDM, the virtual machine World ID is displayed. If any other process is using the raw device, the corresponding information is displayed.
Run this command to list all virtual machines running on the ESXi 5.0 host and identify the virtual machine registered on that LUN:

#esxcli vm process list
Run this command to kill the virtual machine World ID:

#esxcli vm process kill --type=force --world-id World ID
For example:

#esxcli vm process kill --type=force --world-id=12131
Rescan the storage using this command:

#esxcfg-rescan -u vmhba#
Run this command to see the device state:

#esxcli storage core device list -d naa-id
If the issue persists, reboot the ESXi 5.0 host where virtual machine was registered.

reference

http://kb.vmware.com

VMPOOL & TECH BLOG