python for malware static analysis (1)

2019 iT 邦幫忙鐵人賽

DAY 29

自我挑戰組

自然語言技術與AI/ML初探系列第 29 篇

2019鐵人賽

smichelle19

團隊InfoSec Horadrim

2018-11-13 22:15:25

2708 瀏覽

分享至

看到一篇網路分享的投影片，摘要出自己有興趣的部分

透過fuzzy hashing: ssdeep可以算出兩個檔案的相似度

限制：如果有兩隻惡意程式是一樣的，但其中一個惡意程式的其中一個執行檔靜態section篩入垃圾資料，ssdeep比對結果為0，認為它們是不同的檔案。

Python function:
Imphash = md5( Import Table of PE) 輸出相同或不相同
ImpFuzzy = ssdeep(Import Table of PE) 輸出相似比率
限制：如果兩隻惡意程式具有同樣的symbols但使用的順序不同，這兩個方法第一個會輸出不相同的結果、第二個方法會輸出40%相似度。

Graph flow比對法
Polochombr and Machoc: process a fuzzy hash on the graph flow instructions.
R2graphity 畫出static call graphs，如下圖

挑戰：計算成本高、complexity of n2 (n = number of signatures)

分群演算法建議：Dbscan、K-means
特徵：[size of file, number of sections, median of entropy, number of imports, number
of exports]
正規化後的特徵：[size of file / max(size of all files), number of sections/ max(number of
sections of all files), median of entropy /max(median of entropy of all files),
number of imports / max(number of imports of all files), number of exports /
max(number of exports of all files)]

參考來源
https://2018.pass-the-salt.org/files/talks/14-python-and-ml.pdf