Day 18：偵測與日誌（Log）解析實作

2025 iThome 鐵人賽

DAY 2

自我挑戰組

利用生成式AI等工具來學習資安系列第 18 篇

17th鐵人賽

Chinnn

2025-10-14 15:35:43

111 瀏覽

分享至

今日目標（3 項）

了解常見日誌來源與格式（web access log、syslog、auth log），掌握要抓哪些欄位當 IOC（IP、timestamp、request、status、user-agent）。

實作一個簡易 log parser（Python），能統計 top IP、找出 4xx/5xx、以及偵測可疑 user-agent 或大量失敗登入。

產出 2–3 條可直接放入 SIEM/EDR 的偵測規則草案與測試輸出。

今日實作步驟（照做）

準備範例日誌（例如 Apache common log）或用你的 access.log（只在 lab）。

執行下面的 Python 腳本（把 access.log 放同目錄）來產出簡易分析報告。

檢視輸出：top IP、top URLs、4xx/5xx 列表、疑似掃描或攻擊行為（短時間大量 404、同一 IP 多次錯誤等）。

根據結果擬 2–3 條偵測規則（例如：同一 IP 5 分鐘內超過 100 次 404 → 疑似掃描），寫入報告。

Python 範例：簡易 Apache log 解析器（可直接執行）

# log_parser.py
import re
from collections import Counter, defaultdict
from datetime import datetime, timedelta

LOG = "access.log"
pattern = re.compile(r'(?P<ip>\S+) \S+ \S+ \[(?P<time>[^\]]+)\] "(?P<req>[^"]+)" (?P<status>\d{3}) (?P<size>\S+) "(?P<referer>[^"]*)" "(?P<ua>[^"]*)"')

def parse_time(ts):
    return datetime.strptime(ts.split()[0], "%d/%b/%Y:%H:%M:%S")

ips = Counter()
urls = Counter()
statuses = Counter()
ua_counter = Counter()
ip_events = defaultdict(list)

with open(LOG, encoding='utf-8', errors='ignore') as f:
    for line in f:
        m = pattern.search(line)
        if not m:
            continue
        ip = m.group('ip')
        time = parse_time(m.group('time'))
        req = m.group('req')
        status = m.group('status')
        ua = m.group('ua')

        ips[ip] += 1
        try:
            method, path, proto = req.split()
            urls[path] += 1
        except:
            path = req
        statuses[status] += 1
        ua_counter[ua] += 1
        ip_events[ip].append((time, status, path))

# quick report
print("Top 10 IPs:")
for ip, c in ips.most_common(10):
    print(ip, c)

print("\nTop 10 URLs:")
for u, c in urls.most_common(10):
    print(u, c)

print("\nTop statuses:")
for s, c in statuses.most_common():
    print(s, c)

# simple anomaly: many 404s in short window
print("\nSuspicious IPs (>=20 x 404 within 5 minutes):")
for ip, events in ip_events.items():
    # get only 404 events times
    times = sorted([t for (t, st, p) in events if st == '404'])
    if len(times) < 20:
        continue
    # sliding window check
    i = 0
    for j in range(len(times)):
        while times[j] - times[i] > timedelta(minutes=5):
            i += 1
        if j -

三條可直接放入 SIEM/EDR 的偵測規則範例（草案）

大量 404 掃描：同一 IP 在 5 分鐘內產生 ≥20 次 HTTP 404 → 標記為掃描/爬蟲行為。

短時間內大量失敗登入：同一來源在 10 分鐘內超過 10 次 401/403 → 疑似暴力破解嘗試。

可疑 User-Agent：偵測常見掃描工具 UA（如 sqlmap、nikto、空 UA 或明顯可疑字串） → 提升為高嚴重性警示。

今天完成 Day18：聚焦日誌解析與偵測，我用 Python 寫了簡易的 Apache log parser，能快速產出 top IP、top URL 與短時間內大量 404 的可疑 IP 列表。透過實際 log，我擬定了三條可執行的 SIEM 規則（大量 404、暴力破解、可疑 User-Agent），並準備把解析器輸出改為 JSON 以供 SIEM 匯入。下一步要讓 AI 幫我把規則轉成 Splunk/Elastic 查詢語法並優化 parser 的效能。