Day 26 - DQN實作 - MountainCar（2）

12th鐵人賽

毛毛

團隊QQBEE

2020-10-02 01:58:58

2661 瀏覽

分享至

大家好，我是毛毛。
今天是Day 26
今天要來看MountainCar的訓練結果啦~ ヽ(✿ﾟ▽ﾟ)ノ

Main

Gym and Load_weights

   env = gym.make("MountainCar-v0")
   dqn_agent = DQN(env=env, gamma=0.9, epsilon = .95)
   
   print("=================================================================")
   print("Old weights: \n", dqn_agent.get_weights())
   print("=================================================================")
   
   filepath = "./dqn.MountainCar_weights.h5"

   if os.path.isfile(filepath):
     print("Exist\n")
     dqn_agent.load_weights("./dqn.MountainCar_weights.h5")
     print("=================================================================")
     print("Revised weights: \n", dqn_agent.get_weights())
     print("=================================================================")
   else:
     print("Not exist\n")

這邊也跟CartPole一樣

建立gym的環境和DQN
讀取之前的權重

Loop

   times  = 3000
   max_step = 500
   for time in range(times):
       total_reward = 0
       obs = env.reset().reshape(1,2)

       for step in range(max_step):
           env.render()
           action = dqn_agent.choose_action(obs)
           obs_, reward, terminal, _ = env.step(action)
           total_reward += reward

           obs_ = obs_.reshape(1,2)
           dqn_agent.store_transition(obs, action, reward, obs_, terminal)

           dqn_agent.replay_transition()  
           
           dqn_agent.target_replacement() 
           obs = obs_

           if terminal:
               env.render()
               break

       if total_reward == -200:
           print("Failed to complete in time {} with {} reward".format(time, total_reward))

       else:
           print('Nice!! ｡:.ﾟヽ(*´∀`)ﾉﾟ.:｡')
           print("Completed in {} times with {} steps and {} reward".format(time, step, total_reward))
           dqn_agent.save_weights("./dqn.MountainCar_weights.h5")
           #break

   env.close()

重複的部分就不講啦，這邊因為MountainCar的回合結束，只看是不是到了終點，或是走了200步，也就是得到-200的reward，因為只要沒到終點，每一步的reward都是-1。
所以我只有在到終點的時候才將weights更新。
這邊新加的一個就是env.close()，用來關閉gym的模擬視窗。