從vCenter 執行 Replication 後會造成整台 ESXi Host PSOD(紫屏死機)
跟Dell及VMWare詢問了一個多月都查不出問題點
Dell看Log看到無奈又把CPU/MB/RAID卡全換過一輪依然無法排除
我目前測試的結果
Hardware,BIOS,vCenter,Replication,ESXi,Guest OS,Guest VMTools,複寫
Dell R630,2.0.2,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,installed,Fail
Dell R630,2.0.2,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,not installed,Pass
Dell R630,2.0.1,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,installed,Fail
Dell R630,2.0.1,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,not installed,Pass
Dell R630,2.0.1,5.5 u3,5.5.1.6,5.5.0 u3b,Server 2008R2,installed,Fail
Dell R630,2.0.1,5.5 u3,5.5.1.6,5.5.0 u3b,Server 2008R2,not installed,Pass
Dell R630,2.0.1,5.5 u2,5.5.1.6,5.5.0 u2,Server 2008R2,installed,Fail
Dell R630,2.0.1,5.5 u2,5.5.1.6,5.5.0 u2,Windows7 Pro,installed,Pass
目前較肯定的是 GuestOS Server 2008R2 Standard SP1如有安裝 VMWare Tools
在執行vCenter Replication後有很高的機率會PSOD
但用 Windows 7 Pro 測試 有裝 VMWare Tools卻又很正常
ISO 全由 MSDN下載的應該不會有問題
詳細版本號
VMWare vCenter Server Appliance 5.5.0.30200 Build 3255668
VMWare vCenter Server Appliance 5.5.0.20000 Build 2063318
vSphere Replication Appliance 5.8.1.10254 Build 2915556
vSphere Replication Appliance 5.5.1.6 Build 3570689
VMWare ESXi (Dell OEM) 5.5.0 U3b 3568722-A05
VMWare ESXi (Dell OEM) 5.5.0 U2 2718005-A05
VMWare Tools for Windows Version 10.0.0 , Build-3000743 (Update 3)
VMWare Tools for Windows Version 9.4.12 , Build-2627939 (Update 2)
Windows Server 2008 R2 with Service Pack 1 (x64) - DVD (Chinese-Taiwan)
ISO Chinese - Taiwan 發行日期: 2011/2/23 詳細資料
3173 MB
檔案名稱: tw_windows_server_2008_r2_with_sp1_x64_dvd_617595.iso
語言: Chinese - Taiwan
SHA1:39B0C55C9CD3EBDF545CA7B940B5F20957D0986D
有去看過儲存的 COREDUMP 文件嗎,看能不能分析出問題原因
您的 R630 是用內建的 Broadcom 網卡嗎? 有沒有試過將 ESXi 內的 Broadcom 驅動程式更新到最新的版本:
Download VMware ESXi 5.5 Driver CD for Broadcom NetXtreme II Network/iSCSI/FCoE Driver Set
或是換成 Intel 網卡試試看? Broadcom 網卡常常在虛擬環境出問題...
還有搭配一張 Intel X520DA2,拆掉不裝依然會產生此問題
Broadcom Driver有換過了,Dell OEM ESXi 內包的版本較新
而VMWare提供的版本也有測過
net-tg3-3.137l.v55.1-1OEM.550.0.0.1331820.x86_64.vib
另外 H730Mini RAID卡的 Firmware , Driver也都有嘗試換過各個版本
但依然無法排除問題
PSOD 發生後 iDRAC Log會看到
CPU9000 An OEM diagnostic event occurred.
PCI1318 A fatal error was detected on a component at bus 0 device 1 function 0.
CPU9000 An OEM diagnostic event occurred.
CPU0704 CPU 1 machine check error detected.
UEFI0078 One or more Machine Check errors occurred in the previous boot.
PST0090 A problem was detected related to the previous server boot.
SYS1003 System CPU Resetting.
PWR2262 The Intel Management Engine has reported an internal system error.
SYS1003 System CPU Resetting.
RAC0703 Requested system hardreset.
PWR2262 The Intel Management Engine has reported an internal system error.
CPU9000 An OEM diagnostic event occurred.
PCI1318 A fatal error was detected on a component at bus 0 device 1 function 0.
PCI1318 A fatal error was detected on a component at bus 3 device 0 function 0.