Day18 Dion train(上)

第 11 屆 iThome 鐵人賽

DAY 18

AI & Data

人工智慧(RL系列) 完爆遊戲30天系列第 18 篇

11th鐵人賽

皮卡喵

2019-10-03 22:07:39

1559 瀏覽

分享至

之前的準備都是為了今天～或許接下來整個有點漫長，但最終一切辛苦皆是值得的～因為學會了整個訓練過程就可以算是掌握最基本RL囉！讓我們開始吧！

訓練參數

ACTIONS = 2 # 動作：不動作與跳躍
GAMMA = 0.99 # 衰減值
OBSERVATION = 100. # 在訓練前先搜集的樣本數，只跟環境互動不訓練
EXPLORE = 100000 #  設定來定期降低探索率
FINAL_EPSILON = 0.0001 # 最終的探索率
INITIAL_EPSILON = 0.1 # 最初的探索率
REPLAY_MEMORY = 50000 # 樣本的記憶數量
BATCH = 16 # batch_size
FRAME_PER_ACTION = 1 # 禎數，決定幾禎執行一次動作
LEARNING_RATE = 1e-4 # 學習率
img_rows , img_cols = 80,80 # 前處理後的影像大小
img_channels = 4 # 執行一次動作的禎數量

資料備份

一些要做紀錄的data，給之後視覺化用

loss_df = pd.read_csv(loss_file_path) if os.path.isfile(loss_file_path) else pd.DataFrame(columns =['loss'])
scores_df = pd.read_csv(scores_file_path) if os.path.isfile(loss_file_path) else pd.DataFrame(columns = ['scores'])
actions_df = pd.read_csv(actions_file_path) if os.path.isfile(actions_file_path) else pd.DataFrame(columns = ['actions'])
q_values_df =pd.read_csv(actions_file_path) if os.path.isfile(q_value_file_path) else pd.DataFrame(columns = ['qvalues'])

變數宣告

last_time = time.time() # 記錄時間
D = load_obj("D") # 載入本地端的資料
do_nothing = np.zeros(ACTIONS) # 第一個action啥都不做
do_nothing[0] =1 #0 => do nothing, 1=> jump
x_t, r_0, terminal = game_state.get_state(do_nothing) # 輸入第一個action
s_t = np.stack((x_t, x_t, x_t, x_t), axis=2) # 第一次input的畫面因為也沒前面，所以這我們直接把當下這禎，重複四個疊一起
s_t = s_t.reshape(1, s_t.shape[0], s_t.shape[1], s_t.shape[2]) # 1*80*80*4
initial_state = s_t # 宣告初始狀態