在經歷了各資料夾的巡禮後,現在對於怎麼對付Additional_Logs也總算有一點底了。先把剛剛使用過的keyword列出來,一個個拿來掃描吧。
grep -i ierr *
grep -i voltage *
grep -i correctable *
root@mynb:/mnt/d/Difficult_Company/Additional_Logs# grep -i ierr *
root@mynb:/mnt/d/Difficult_Company/Additional_Logs#
很好,沒有IERR的檔案在裡頭!
root@mynb:/mnt/d/Difficult_Company/Additional_Logs# grep -i voltage *
107m23.log: Sensor Type : Voltage
107m23.log: Sensor Type : Voltage
11m188.log: Sensor Type : Voltage
11m188.log: Sensor Type : Voltage
11m188.log: Sensor Type : Voltage
...
...
看來,這個資料夾裡還是有漏網之魚,來算一下有幾台吧。
root@mynb:/mnt/d/Difficult_Company/Additional_Logs# grep -i voltage * | uniq | wc -l
3
root@mynb:/mnt/d/Difficult_Company/Additional_Logs#
root@mynb:/mnt/d/Difficult_Company/Additional_Logs# grep "Uncorrectable" * | uniq
15m178.log: Description : Uncorrectable machine check exception
Nice! 竟然只有一台UMCE
root@mynb:/mnt/d/Difficult_Company/Additional_Logs# grep -i correctable *| uniq
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC
107m23.log: Description : Correctable ECC logging limit reached
107m23.log: Description : Correctable memory error logging disabled
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC logging limit reached
11m188.log: Description : Correctable memory error logging disabled
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
11m188.log: Description : Correctable ECC
15m178.log: Description : Uncorrectable machine check exception
15m178.log: Description : Uncorrectable machine check exception
15m178.log: Description : Uncorrectable machine check exception
15m178.log: Description : Uncorrectable machine check exception
15m178.log: Description : Uncorrectable machine check exception
15m178.log: Description : Uncorrectable machine check exception
15m178.log: Description : Correctable machine check error
15m178.log: Description : Correctable machine check error
15m178.log: Description : Correctable ECC
排除已知是UMCE的機器後,CECC竟然也只有2台,真的是來亂的
IERR: 0
voltage: 3
UMCE: 1
CECC: 2
UECC: 0
而且仔細比對檔名後,這六個error其實出自3個檔案。
這一千多個log裡只有3個檔案有error,這也算是不幸中的大幸了吧。
話說回來這麼大量的錯誤怎麼想都覺得不太合理,最後來研究下這些類別的錯誤落在那些時間區間,方便後續和客人討論他們是不是有什麼時間點的行為,觸發了這些大範圍的錯誤。
拿Voltage的資料來示範一下。
首先用抓取Voltage的grep cmd,此時再結合grep 的before/after功能,找到相關錯誤訊息的timestamp
root@mynb:/mnt/d/Difficult_Company/Voltage# grep -i voltage -B 3 -A 4 *
203m6.log- Timestamp : 11/11/2019 05:23:11
203m6.log- Generator ID : 0020
203m6.log- EvM Revision : 04
203m6.log: Sensor Type : Voltage
203m6.log- Sensor Number : d7
203m6.log- Event Type : Threshold
203m6.log- Event Direction : Assertion Event
203m6.log- Event Data : 526464
--
203m6.log- Timestamp : 11/11/2019 05:23:13
203m6.log- Generator ID : 0020
203m6.log- EvM Revision : 04
203m6.log: Sensor Type : Voltage
203m6.log- Sensor Number : d7
203m6.log- Event Type : Threshold
203m6.log- Event Direction : Deassertion Event
203m6.log- Event Data : 526564
...
...
再來將這些timestamp輸出成txt檔,然後用前面提過的excel文字匯入精靈來做分割文字的動作
root@mynb:/mnt/d/Difficult_Company/Voltage# grep -i voltage -B 3 -A 4 * | grep -i timestamp > voltage_time.txt
root@mynb:/mnt/d/Difficult_Company/Voltage# less voltage_time.txt
114m78.log- Timestamp : 10/16/2019 15:41:16
114m78.log- Timestamp : 10/16/2019 15:41:18
114m80.log- Timestamp : 07/29/2019 10:39:27
114m80.log- Timestamp : 07/29/2019 10:39:29
114m80.log- Timestamp : 09/07/2019 13:55:47
114m80.log- Timestamp : 09/07/2019 13:55:50
114m80.log- Timestamp : 10/17/2019 04:17:40
114m80.log- Timestamp : 10/17/2019 04:17:42
114m80.log- Timestamp : 10/21/2019 19:59:44
114m80.log- Timestamp : 10/21/2019 19:59:46
114m89.log- Timestamp : 04/18/2019 02:59:37
114m89.log- Timestamp : 04/18/2019 03:00:32
114m89.log- Timestamp : 04/18/2019 03:00:40
114m89.log- Timestamp : 04/18/2019 03:02:01
114m89.log- Timestamp : 04/18/2019 03:02:11
114m89.log- Timestamp : 04/18/2019 03:02:13
...
...
以上資料分割後,在excel裡用固定寬度來分割,timestamp的地方建議把日期與時間做分割,單獨用日期來做排序會比較容易
分割後的資料如下:
再用excel篩選與排序的功能,然後看一下問題的時間跨度,沒想到真的橫跨了整整一年超過。
16m214.log- Timestamp : 2018/9/29 18:10:40
16m214.log- Timestamp : 2018/9/29 18:10:42
...
216m21.log- Timestamp : 2019/11/14 11:41:58
216m21.log- Timestamp : 2019/11/14 11:42:06
看來這個客人真的是suffer了很長一段時間,內部有場硬仗要打了!
把剩下的資料夾依法炮製,整理進同一個excel檔後,就可以帶著這些資料去快速的把會議結束了!
不然等到一眾長官跟各方人馬在會議上才要來看這些log,討論對策,真的很難想像這個會要開到幾點...是吧?