iT邦幫忙

2024 iThome 鐵人賽

DAY 21
0
Python

時空序列分析-關鍵籌碼分析系列 第 21

將圖像轉換成空間特徵,再次重新訓練時空資料的ConvLSTM模型!

  • 分享至 

  • xImage
  •  

昨天發現到,

資料的輸入應該是用2D圖(空間資料)+籌碼買賣超資料(時序資料)當作輸入。

另外,資料也沒有切訓練集、測試集。

具體步驟:

  1. 將時間序列資料擴展,為和圖像相同的形狀。
  2. 將圖像資料重塑,並擴展到時間步長(time step)。
  3. 將時間序列資料,與圖像資料合併。
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from keras.models import Sequential
from keras.layers import ConvLSTM2D, Flatten, Dense
from tensorflow.keras.optimizers import Adam
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

把那張政治正確的台灣分點散布圖拿出來,我取名broker_locations.png
讀進來轉換成模型可以接受的大小。

# 讀取保存的圖像並轉換為資料矩陣
image = Image.open('broker_locations.png')
image = image.convert('L')  # 轉為灰度圖像
image = image.resize((100, 100), Image.Resampling.LANCZOS)  # 縮小圖像

data_matrix = np.array(image).astype(np.float32)
channels = 1

print("Reduced data matrix shape:", data_matrix.shape)

Reduced data matrix shape: (100, 100)

籌碼資料一樣讀進來

# 資料表讀進來
import pandas as pd

dataOri= pd.read_excel('Merged_to_train_ALL_trans_New.xlsx')
dataOri
df = pd.DataFrame(dataOri)

# Convert columns to numeric, errors='coerce' will convert non-convertible values to NaN
for col in df.columns[1:]:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# 設置券商代碼為索引
df.set_index('券商代碼', inplace=True)

中間試錯了一些,為了看形狀,都變成註解的東西

# 創建時間序列資料
def create_sequences(data, time_steps=3):
    X, y = [], []
    for i in range(len(data) - time_steps):
        X.append(data[i:i+time_steps, :])  # Use all columns  :-1
        y.append(data[i+time_steps, -1])     # Use the last column as the target
    return np.array(X), np.array(y)

data = df.values

# 創建輸入和輸出資料
X, y = create_sequences(data, time_steps=3)


# Print shapes of X and y
print("X shape before reshaping:", X.shape)
print("y shape:", y.shape)

# 步驟2:準備模型輸入
# 假設這裡有樣本數808,時間步長3
samples, time_steps,features = X.shape
#, features
width = data_matrix.shape[1]
height = data_matrix.shape[0]
#channels = 1

X shape before reshaping: (808, 3, 4)
y shape: (808,)
眼尖的朋友應該會發現昨天是808,3,3 ...

切割資料集,切成訓練集和測試集,8:2

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1024)

把圖像資料轉換成模型所需(跟昨天有很大不一樣)

(一堆犧牲的死亡註解,就像戰場上被射穿的弟兄,留著可能還可以擋子彈)

# 建立圖像資料
width = data_matrix.shape[1]
height = data_matrix.shape[0]

# 將時間序列資料擴展為與圖像資料相同的形狀
def expand_data(X, width, height):
    X_expanded = np.repeat(X[:, :, np.newaxis, np.newaxis, :], width, axis=2)
    X_expanded = np.repeat(X_expanded, height, axis=3)
    return X_expanded
#X_expanded = np.repeat(X[:, :, np.newaxis, np.newaxis, :], width, axis=2)
#X_expanded = np.repeat(X_expanded, height, axis=3)

X_train_expanded = expand_data(X_train, width, height)
X_test_expanded = expand_data(X_test, width, height)

image_data_train = np.tile(data_matrix[np.newaxis, np.newaxis, :, :, np.newaxis], (X_train.shape[0], X_train.shape[1], 1, 1, 1))
image_data_test = np.tile(data_matrix[np.newaxis, np.newaxis, :, :, np.newaxis], (X_test.shape[0], X_test.shape[1], 1, 1, 1))

#image_data = np.tile(data_matrix[np.newaxis, np.newaxis, :, :, np.newaxis], (samples, time_steps, 1, 1, 1))

# 合併時間序列資料和圖像資料
combined_data_train = np.concatenate((X_train_expanded, image_data_train), axis=-1)
combined_data_test = np.concatenate((X_test_expanded, image_data_test), axis=-1)
#combined_data = np.concatenate((X_expanded, image_data), axis=-1)

# 檢查合併後的形狀
print("Combined train data shape:", combined_data_train.shape)
print("Combined test data shape:", combined_data_test.shape)
#print("Combined data shape:", combined_data.shape) 
#Combined data shape: (808,3,100,100,5)
#print("X_expanded shape:", X_expanded.shape) 
#X_expanded shape: (808,3,100,100,4)
#print("image_data shape:", image_data.shape) 
#image_data shape: (808,3,100,100,1)

#X = X.reshape((samples, time_steps, features, width, height, channels))
#

Combined train data shape: (646, 3, 100, 100, 5)
Combined test data shape: (162, 3, 100, 100, 5)

資料塑形完畢,模型建立是必須的

model = Sequential([
    ConvLSTM2D(filters=64, kernel_size=(1, 1), input_shape=(combined_data_train.shape[1:]), return_sequences=True),
    ConvLSTM2D(filters=32, kernel_size=(1, 1), return_sequences=False),
    Flatten(),
    Dense(50, activation='relu'),
    Dense(1)
])


model.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
model.summary()

# 建立好,給他fit下去
history = model.fit(combined_data_train, y_train, epochs=50, batch_size=32, validation_split=0.1)
#history = model.fit(combined_data, y, epochs=50, batch_size=32, validation_split=0.1)

輸出結果透露了有個人做出了五個不一樣的模型><

Model: "sequential_5"
┃ Layer (type) ┃ Output Shape ┃ Param # ┃
│ conv_lstm2d_10 (ConvLSTM2D) │ (None, 3, 100, 100, 64)│ 17,920 │
│ conv_lstm2d_11 (ConvLSTM2D) │ (None, 100, 100, 32) │ 12,416 │
│ flatten_5 (Flatten) │ (None, 320000) │ 0 │
│ dense_8 (Dense) │ (None, 50) │ 16,000,050 │
│ dense_9 (Dense) │ (None, 1) │ 51 │

Total params: 16,030,437 (61.15 MB)
Trainable params: 16,030,437 (61.15 MB)
Non-trainable params: 0 (0.00 B)

Epoch 1/50
23/23 ━━━━━━━━━━━━━━━━━━━━ 59s 2s/step - loss: nan - val_loss: nan
Epoch 2/50
23/23 ━━━━━━━━━━━━━━━━━━━━ 47s 2s/step - loss: nan - val_loss: nan
. . .
Epoch 50/50
23/23 ━━━━━━━━━━━━━━━━━━━━ 41s 2s/step - loss: nan - val_loss: nan
26/26 ━━━━━━━━━━━━━━━━━━━━ 19s 690ms/step

很好很好,那個參數量到底是甚麼鬼,1600萬????
怪不得訓練很久...

圖就印出來看,就完事了

predictions = model.predict(combined_data_test)

plt.plot(predictions, label='Predicted')
plt.plot(y_test, label='Actual')
plt.legend()
plt.show()

https://ithelp.ithome.com.tw/upload/images/20240821/20168322wNNPwhWCSf.png
的確完事了,這張圖就跟心電圖一樣,等一下我的心就涼了 (沒

嗯... 要是覺得模型有用處想留,可以用下面這段程式碼保留住訓練不易的模型。

儲存模型用到joblib這個工具

### 儲存模型

import joblib
# 儲存模型至指定路徑
joblib.dump(model, 'trainModel_0821.joblib')

三天搞模型,三天都產出不一樣的圖@@
我是大巫師嗎(? 資料丟進大魔鍋,出來啥全靠命(誤
容我再好好思考哪裡出了錯誤QQ

參考文章&資料來源:

  1. Time Series data to fit for ConvLSTM
  2. convLSTM : how to structure input data
  3. tf.keras.layers.ConvLSTM2D
  4. tf.keras.layers.ConvLSTM3D

每日記錄:
加權指數收在22237.89點,下跌191.21點,連續漲了快2個禮拜,終於收盤跌了。
美股那個預測,非農數據人數下調,就很危險,最高可能下修100萬人(超多,
美國非農數據最高恐下修百萬人 Fed面臨就業市場難題


這幾天的部分不像前面那麼有趣,尤其是模型的部分,漫長乏味,還要一直試錯。
我是不會放棄的! 就像殺不死的蟑螂一樣...(聽起來好像有點嘔心(?
又不是孔明,三顧模型...


上一篇
資料欄位重新轉置,重新訓練時空資料的ConvLSTM模型!
下一篇
四顧模型迎來的春天,時空資料的ConvLSTM模型訓練成功!
系列文
時空序列分析-關鍵籌碼分析31
圖片
  直播研討會
圖片
{{ item.channelVendor }} {{ item.webinarstarted }} |
{{ formatDate(item.duration) }}
直播中

尚未有邦友留言

立即登入留言