今天是第八天剛好奧運如火如荼正在進行,因此我想寫一個Lstm去預測我喜歡的隊伍的奪冠率,以下是程式碼
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
# 假設我們有一個包含歷史比賽數據的DataFrame
# 比如: year, team, opponent, score, opponent_score, win, ... 等等
data = pd.read_csv('olympic_team_data.csv')
# 先做資料預處理,將字串類資料轉換成數字類資料
data['team'] = pd.Categorical(data['team']).codes
data['opponent'] = pd.Categorical(data['opponent']).codes
# 選擇我們感興趣的特徵和標籤
features = data[['year', 'team', 'opponent', 'score', 'opponent_score']]
labels = data['win'] # 假設 'win' 表示是否奪冠 (1 表示奪冠, 0 表示未奪冠)
# 將資料正規化
scaler = MinMaxScaler()
features_scaled = scaler.fit_transform(features)
# 將資料轉換成LSTM所需的三維輸入 (samples, timesteps, features)
X = []
y = []
timesteps = 10 # 設置LSTM時間步長
for i in range(timesteps, len(features_scaled)):
X.append(features_scaled[i-timesteps:i])
y.append(labels.iloc[i])
X, y = np.array(X), np.array(y)
# 分割訓練集和測試集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 建立LSTM模型
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(units=25, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))
# 編譯模型
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
# 訓練模型
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))
# 評估模型
loss, accuracy = model.evaluate(X_test, y_test)
print(f"模型準確率: {accuracy * 100:.2f}%")
# 預測奪冠率
predictions = model.predict(X_test)
predictions = (predictions > 0.5).astype(int)
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
sklearn
,用於將數據分割成訓練集和測試集。Sequential
模型(層按順序堆疊)和 LSTM
層等。data = pd.read_csv('olympic_team_data.csv')
這裡我們假設你有一個 CSV 檔案 olympic_team_data.csv
,其中包含運動隊伍的歷史比賽數據。
data['team'] = pd.Categorical(data['team']).codes
data['opponent'] = pd.Categorical(data['opponent']).codes
team
和 opponent
)轉換成數字編碼,因為 LSTM 模型無法直接處理字串類型的數據。features = data[['year', 'team', 'opponent', 'score', 'opponent_score']]
labels = data['win'] # 假設 'win' 表示是否奪冠 (1 表示奪冠, 0 表示未奪冠)
win
,表示隊伍是否奪冠。scaler = MinMaxScaler()
features_scaled = scaler.fit_transform(features)
X = []
y = []
timesteps = 10 # 設置LSTM時間步長
for i in range(timesteps, len(features_scaled)):
X.append(features_scaled[i-timesteps:i])
y.append(labels.iloc[i])
X, y = np.array(X), np.array(y)
X
是 LSTM 的輸入特徵,y
是對應的標籤(是否奪冠)。這裡我們使用一個循環來創建 X
和 y
,其中 X
是一個三維數據(samples, timesteps, features),而 y
是一維的標籤。X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(units=25, activation='relu'))
model.add(Dense(units=1, activation='sigmoid'))
units=50
表示該層有 50 個記憶單元。return_sequences=True
表示該層會返回完整的序列(適合堆疊多層 LSTM),而 return_sequences=False
表示只返回最後一個時間步長的輸出。Dense
層有 25 個神經元,使用 ReLU 激活函數;最後一個 Dense
層有 1 個神經元,使用 Sigmoid 激活函數,輸出值範圍在 0 到 1 之間,適合二元分類問題。model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=50, batch_size=32, validation_data=(X_test, y_test))
loss, accuracy = model.evaluate(X_test, y_test)
print(f"模型準確率: {accuracy * 100:.2f}%")
predictions = model.predict(X_test)
predictions = (predictions > 0.5).astype(int)
這個 LSTM 模型是為了解析過去的數據並預測未來奧運隊伍是否奪冠的概率。當然,模型的效果會受到訓練數據質量和特徵選擇的影響,模型本身也可以進一步優化,比如調整 LSTM 層數或神經元數量。