Cannot remount a datastore after an unplanned PDL


Symptoms

  • After a storage device has unexpectedly unpresented from the storage array, you are unable to mount it again
  • There was a running virtual machine when storage device went offline
  • ESXi 5.0 host cannot mount the storage after the LUN is online again
  • In the vmkernel log file, you see entries similar to:

    2012-02-13T22:47:44.243Z cpu36:5590)Vol3: 1665: Error refreshing FD resMeta: Device is permanently unavailable
    2012-02-13T22:47:44.281Z cpu34:5590)VC: 1449: Device rescan time 165 msec (total number of devices 75)
    2012-02-13T22:47:44.281Z cpu34:5590)VC: 1452: Filesystem probe time 504 msec (devices probed 48 of 75)
    2012-02-13T22:47:44.406Z cpu38:5590)ScsiDevice: 4592: naa.6006016058201700354179be0c6fdf11 device :Open count > 0, cannot be brought online
    2012-02-13T22:47:44.654Z cpu34:5590)Vol3: 647: Couldn't read volume header from control: Invalid handle
    2012-02-13T22:47:44.654Z cpu34:5590)FSS: 4333: No FS driver claimed device 'control': Not supported
    2012-02-13T22:47:45.008Z cpu38:5590)ScsiDeviceIO: 2316: Cmd(0x4124c0ea2e80) 0x28, CmdSN 0x70509 to dev "naa.6006016058201700354179be0c6fdf11" failed H:0x1 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.

Cause

This is an expected behaviour because the I/O on LUNs did not terminate gracefully. To properly remove a datastore, seeUnpresenting a LUN in ESXi 5.x (2004605).

Resolution

To resolve this issue:
  1. Run this command to see the world that has the device open for the LUN:

    #esxcli storage core device world list -d naa-id
    For example:

    #esxcli storage core device world list -d naa.6006016058201700354179be0c6fdf11
    You see output similar to:

    Device                                World ID  Open Count  World Name
    ------------------------------------  --------  ----------  ----------
    naa.6006016058201700354179be0c6fdf11      2060           1  idle0
    If a VMFS volume is using the device indirectly, the world name includes the string idle0. If a virtual machine uses the device as an RDM, the virtual machine World ID is displayed. If any other process is using the raw device, the corresponding information is displayed.
  2. Run this command to list all virtual machines running on the ESXi 5.0 host and identify the virtual machine registered on that LUN:

    #esxcli vm process list
  3. Run this command to kill the virtual machine World ID:

    #esxcli vm process kill --type=force --world-id World ID
    For example:

    #esxcli vm process kill --type=force --world-id=12131
  4. Rescan the storage using this command:

    #esxcfg-rescan -u vmhba#
  5. Run this command to see the device state:

    #esxcli storage core device list -d naa-id
  6. If the issue persists, reboot the ESXi 5.0 host where virtual machine was registered.
reference
http://kb.vmware.com

ความคิดเห็น

โพสต์ยอดนิยมจากบล็อกนี้

ความแตกต่างระหว่าง ESX และ ESXi

ความสามารถครั้งใหญ่ของ Virtual Machine File System 5 (VMFS-5)

ติดตั้ง และใช้งาน VMware ESXi 4 (Free Version)