iT邦幫忙

0

vCenter Replication PSOD 問題

從vCenter 執行 Replication 後會造成整台 ESXi Host PSOD(紫屏死機)
跟Dell及VMWare詢問了一個多月都查不出問題點
Dell看Log看到無奈又把CPU/MB/RAID卡全換過一輪依然無法排除

http://ithelp.ithome.com.tw/upload/images/20160518/20096194V2Q1kkidI3.jpg

我目前測試的結果

Hardware,BIOS,vCenter,Replication,ESXi,Guest OS,Guest VMTools,複寫
Dell R630,2.0.2,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,installed,Fail
Dell R630,2.0.2,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,not installed,Pass
Dell R630,2.0.1,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,installed,Fail
Dell R630,2.0.1,5.5 u3,5.8.1,5.5.0 u3b,Server 2008R2,not installed,Pass
Dell R630,2.0.1,5.5 u3,5.5.1.6,5.5.0 u3b,Server 2008R2,installed,Fail
Dell R630,2.0.1,5.5 u3,5.5.1.6,5.5.0 u3b,Server 2008R2,not installed,Pass
Dell R630,2.0.1,5.5 u2,5.5.1.6,5.5.0 u2,Server 2008R2,installed,Fail
Dell R630,2.0.1,5.5 u2,5.5.1.6,5.5.0 u2,Windows7 Pro,installed,Pass

目前較肯定的是 GuestOS Server 2008R2 Standard SP1如有安裝 VMWare Tools
在執行vCenter Replication後有很高的機率會PSOD
但用 Windows 7 Pro 測試 有裝 VMWare Tools卻又很正常

ISO 全由 MSDN下載的應該不會有問題

詳細版本號
VMWare vCenter Server Appliance 5.5.0.30200 Build 3255668
VMWare vCenter Server Appliance 5.5.0.20000 Build 2063318

vSphere Replication Appliance 5.8.1.10254 Build 2915556
vSphere Replication Appliance 5.5.1.6 Build 3570689

VMWare ESXi (Dell OEM) 5.5.0 U3b 3568722-A05
VMWare ESXi (Dell OEM) 5.5.0 U2 2718005-A05

VMWare Tools for Windows Version 10.0.0 , Build-3000743 (Update 3)
VMWare Tools for Windows Version 9.4.12 , Build-2627939 (Update 2)

Windows Server 2008 R2 with Service Pack 1 (x64) - DVD (Chinese-Taiwan)
ISO Chinese - Taiwan 發行日期: 2011/2/23 詳細資料
3173 MB
檔案名稱: tw_windows_server_2008_r2_with_sp1_x64_dvd_617595.iso
語言: Chinese - Taiwan
SHA1:39B0C55C9CD3EBDF545CA7B940B5F20957D0986D

hope000 iT邦新手 5 級 ‧ 2016-05-18 16:38:21 檢舉
補充 :
剛再次測試,用同一份ISO安裝2008R2 Enterprise SP1
並且 VMWare Tools,目前執行Replication一切正常

難道 2008R2 Standard 跟 VMWare Tools 有相容性問題 !!!?

2 個回答

0
窮嘶發發發
iT邦高手 1 級 ‧ 2016-05-18 10:36:06

有去看過儲存的 COREDUMP 文件嗎,看能不能分析出問題原因

hope000 iT邦新手 5 級 ‧ 2016-05-18 10:49:50 檢舉

沒有產生出CoreDump,這點連VMWare技術支援也覺得很神奇
各種版本的VC,VR都嘗試過了

Dell那邊也試過很多版本的BIOS , Driver , Firmware
解不了就是解不了 (汗

0
raytracy
iT邦大神 1 級 ‧ 2016-05-18 11:25:16

您的 R630 是用內建的 Broadcom 網卡嗎? 有沒有試過將 ESXi 內的 Broadcom 驅動程式更新到最新的版本:
Download VMware ESXi 5.5 Driver CD for Broadcom NetXtreme II Network/iSCSI/FCoE Driver Set

或是換成 Intel 網卡試試看? Broadcom 網卡常常在虛擬環境出問題...

hope000 iT邦新手 5 級 ‧ 2016-05-18 11:42:40 檢舉

還有搭配一張 Intel X520DA2,拆掉不裝依然會產生此問題

Broadcom Driver有換過了,Dell OEM ESXi 內包的版本較新
而VMWare提供的版本也有測過
net-tg3-3.137l.v55.1-1OEM.550.0.0.1331820.x86_64.vib

另外 H730Mini RAID卡的 Firmware , Driver也都有嘗試換過各個版本
但依然無法排除問題

PSOD 發生後 iDRAC Log會看到

CPU9000 An OEM diagnostic event occurred.

PCI1318 A fatal error was detected on a component at bus 0 device 1 function 0.

CPU9000 An OEM diagnostic event occurred.

CPU0704 CPU 1 machine check error detected.

UEFI0078 One or more Machine Check errors occurred in the previous boot.

PST0090 A problem was detected related to the previous server boot.

SYS1003 System CPU Resetting.

PWR2262 The Intel Management Engine has reported an internal system error.

SYS1003 System CPU Resetting.

RAC0703 Requested system hardreset.

PWR2262 The Intel Management Engine has reported an internal system error.

CPU9000 An OEM diagnostic event occurred.

PCI1318 A fatal error was detected on a component at bus 0 device 1 function 0.

PCI1318 A fatal error was detected on a component at bus 3 device 0 function 0.

我要發表回答

立即登入回答