我在嘗試用程式解井字遊戲的盤面,
首先,我寫了一個基本架構:
# 寫井字遊戲的基本邏輯,棋子共'X','O'兩種
class TicTacToe():
def __init__(self):
self.board = [[' ']*3 for i in range(3)]
# 初始化棋盤
def iniBoard(self):
for i in range(3):
for j in range(3):
self.board[i][j]=' '
# 視覺化的把井字棋棋盤畫出來
def drawBoard(self) -> None:
HLINE = ' ' * 3 + '+---' * 3 + '+'
VLINE = (' ' * 3 +'|') * (3 +1)
title = ' 0'
for i in range(1,3):
title += ' ' * 3 +str(i)
print(title)
print(HLINE)
for y in range(3):
print(VLINE)
print(y, end=' ')
for x in range(3):
print(f'| {self.board[x][y]}', end=' ')
print('|')
print(VLINE)
print(HLINE)
def isOnBoard(self, x, y):
return 0 <= x < 3 and 0 <= y < 3
#檢查tile放在某個座標是否為合法棋步
def isValidMove(self, tile, x, y):
return self.isOnBoard(x, y) and self.board[x][y]==' '
# 把棋子下在座標x, y的地方
def makeMove(self, tile, x, y):
if self.isValidMove(tile, x, y):
self.board[x][y]=tile
# 回傳現在盤面輪到tile走的所有合法棋步
def getValidMoves(self, tile):
return [[x, y] for x in range(3) for y in range(3) if self.isValidMove(tile, x, y)]
# 判斷一個盤面是否有人贏了
def check_TicTacToe(self):
rows = list(map(''.join,self.board))
cols = list(map(''.join, zip(*rows)))
diags = list(map(''.join, zip(*[(r[i], r[2 - i]) for i, r in enumerate(rows)])))
lines = rows + cols + diags
if 'XXX' in lines:
return 'X'
if 'OOO' in lines:
return 'O'
return 'D' # draw(和棋)
我陣列裡面的棋子共有'O'
和X
兩種,
用字串'O'
, X
, ' '
代表格子內的棋子是'O'
, X
還是空白,
棋盤的座標是先看x座標再看y座標,
圖示(比如說00是左上角,20是右上角):
0 1 2
+---+---+---+
| | | |
0 | | | |
| | | |
+---+---+---+
| | | |
1 | | | |
| | | |
+---+---+---+
| | | |
2 | | | |
| | | |
+---+---+---+
這部分我是想要實作min-max演算法或alpha-beta演算法來解井字遊戲的盤面。
參考維基百科: 極小化極大演算法,Alpha-beta剪枝
這部分我已經debug一整天,實在看不出哪裡有誤,
要把對局樹印出來又有點複雜,
只好來此求助。
我的電腦下棋邏輯如下,
其中class Node是用來記錄一個井字棋的盤面,
TicTacToeAI實作alpha-beta演算法,
我已經儘量跟維基上的程式碼一致了,
不一樣的地方在於維基上的只有算最佳分數,
我的多回傳最佳棋步(畢竟讓電腦下棋總要知道下在哪裡,不能只知道這個盤面會贏還是會輸吧?)
import random
from copy import deepcopy
class Node(TicTacToe):
def __init__(self, board):
super().__init__()
self.board = deepcopy(board)
def ending(self):
return self.check_TicTacToe()!='D'
# 'X'贏的話得分
def valuef(self):
winTile = self.check_TicTacToe()
if winTile == 'X':
return 100
elif winTile == 'O':
return -100
return 0
# 電腦ai下棋的邏輯
class TicTacToeAI(TicTacToe):
def __init__(self, board):
super().__init__()
self.board = board
"""
參數: node代表盤面,depth是窮舉幾步棋,isMaxPlayer為True時換'X'走
alpha, beta是alpha-beta演算法的參數,如果看不懂,
理論上你把我函數的
if beta <= alpha:
break
註解掉即是min-max演算法,
代表說窮舉所有棋步不剪枝
"""
def alphabeta(self, node, depth, alpha, beta, isMaxPlayer):
if depth==0 or node.ending():
return [-1,-1], node.valuef()
if isMaxPlayer: # 'X'的回合
bestop, bestScore = [-1,-1], -99999999
moves = node.getValidMoves('X')
random.shuffle(moves)
for move in moves:
newNode = Node(self.board)
newNode.makeMove('X', move[0], move[1])
M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)
if score> bestScore:
bestop = move[:]
bestScore = score
alpha = max(alpha, bestScore)
if beta <= alpha:
break
return bestop, bestScore
else: # 'O'的回合
bestop, bestScore = [-1,-1], 99999999
moves = node.getValidMoves('O')
random.shuffle(moves)
for move in moves:
newNode = Node(self.board)
newNode.makeMove('O', move[0], move[1])
M, score = self.alphabeta(newNode, depth-1, alpha,beta,True)
if score< bestScore:
bestop = move[:]
bestScore = score
beta = min(beta, bestScore)
if beta <= alpha:
break
return bestop, bestScore
# 給定盤面board,回傳電腦的選擇
def getComputerMove(self, computerTile):
root = Node(self.board)
isMaxPlayer = (computerTile == 'X')
move, score = self.alphabeta(root, 3, -999,999, isMaxPlayer)
return move
這是我用來測試程式正確性的三個盤面(假設都換X下):
第一個盤面
0 1 2
+---+---+---+
| | | |
0 | O | | |
| | | |
+---+---+---+
| | | |
1 | O | X | |
| | | |
+---+---+---+
| | | |
2 | X | | |
| | | |
+---+---+---+
這邊很明顯,X只要再一步下在座標2,0就贏了,
電腦應回傳[2,0]過關
第二個盤面也很簡單,就是第一個盤面O,X互換
0 1 2
+---+---+---+
| | | |
0 | X | | |
| | | |
+---+---+---+
| | | |
1 | X | O | |
| | | |
+---+---+---+
| | | |
2 | O | | |
| | | |
+---+---+---+
很明顯,X必須下在座標2,0阻止O連線了,
電腦應回傳[2,0]過關
第三個盤面需要計算三步,
但對人來說也很簡單,
X下在座標1,2或2,2皆可必勝
0 1 2
+---+---+---+
| | | |
0 | | | O |
| | | |
+---+---+---+
| | | |
1 | O | X | |
| | | |
+---+---+---+
| | | |
2 | X | | |
| | | |
+---+---+---+
測試盤面的程式我用python的unitest模組來做
import unittest
class Test(unittest.TestCase):
def test_basic(self):
game = TicTacToe()
game.makeMove('O',0,0)
game.makeMove('O',0,1)
game.makeMove('X',1,1)
game.makeMove('X',0,2)
ai = TicTacToeAI(game.board)
move = ai.getComputerMove('X')
ai.drawBoard()
self.assertEqual(move, [2,0])
def test_basic2(self):
game = TicTacToe()
game.makeMove('X',0,0)
game.makeMove('X',0,1)
game.makeMove('O',1,1)
game.makeMove('O',0,2)
ai = TicTacToeAI(game.board)
move = ai.getComputerMove('X')
ai.drawBoard()
self.assertEqual(move, [2,0])
def test_fixDatas(self):
game = TicTacToe()
game.makeMove('O',2,0)
game.makeMove('O',0,1)
game.makeMove('X',1,1)
game.makeMove('X',0,2)
ai = TicTacToeAI(game.board)
move = ai.getComputerMove('X')
ai.drawBoard()
self.assertIn(move, [[1,2],[2,2]])
unittest.main()
我的測試環境為線上程式編譯器repl.it,
unitest的class當然不能改,
如果執行程式沒有錯誤,就算過關
註: 目前是有時會過關,有時會錯,
因為我有用random打亂電腦算棋的順序,
但理論上不論算棋的順序,
最好的一步應該是一樣的
import random
import unittest
from copy import deepcopy
# 寫井字遊戲的基本邏輯,棋子共'X','O'兩種
class TicTacToe():
def __init__(self):
self.board = [[' ']*3 for i in range(3)]
# 初始化棋盤
def iniBoard(self):
for i in range(3):
for j in range(3):
self.board[i][j]=' '
# 視覺化的把井字棋棋盤畫出來
def drawBoard(self) -> None:
HLINE = ' ' * 3 + '+---' * 3 + '+'
VLINE = (' ' * 3 +'|') * (3 +1)
title = ' 0'
for i in range(1,3):
title += ' ' * 3 +str(i)
print(title)
print(HLINE)
for y in range(3):
print(VLINE)
print(y, end=' ')
for x in range(3):
print(f'| {self.board[x][y]}', end=' ')
print('|')
print(VLINE)
print(HLINE)
def isOnBoard(self, x, y):
return 0 <= x < 3 and 0 <= y < 3
#檢查tile放在某個座標是否為合法棋步
def isValidMove(self, tile, x, y):
return self.isOnBoard(x, y) and self.board[x][y]==' '
# 把棋子下在座標x, y的地方
def makeMove(self, tile, x, y):
if self.isValidMove(tile, x, y):
self.board[x][y]=tile
# 回傳現在盤面輪到tile走的所有合法棋步
def getValidMoves(self, tile):
return [[x, y] for x in range(3) for y in range(3) if self.isValidMove(tile, x, y)]
# 判斷一個盤面是否有人贏了
def check_TicTacToe(self):
rows = list(map(''.join,self.board))
cols = list(map(''.join, zip(*rows)))
diags = list(map(''.join, zip(*[(r[i], r[2 - i]) for i, r in enumerate(rows)])))
lines = rows + cols + diags
if 'XXX' in lines:
return 'X'
if 'OOO' in lines:
return 'O'
return 'D' # draw(和棋)
class Node(TicTacToe):
def __init__(self, board):
super().__init__()
self.board = deepcopy(board)
def ending(self):
return self.check_TicTacToe()!='D'
# 'X'贏的話得分
def valuef(self):
winTile = self.check_TicTacToe()
if winTile == 'X':
return 100
elif winTile == 'O':
return -100
return 0
# 電腦ai下棋的邏輯
class TicTacToeAI(TicTacToe):
def __init__(self, board):
super().__init__()
self.board = board
def alphabeta(self, node, depth, alpha, beta, isMaxPlayer):
if depth==0 or node.ending():
return [-1,-1], node.valuef()
if isMaxPlayer: # 'X'的回合
bestop, bestScore = [-1,-1], -99999999
moves = node.getValidMoves('X')
random.shuffle(moves)
for move in moves:
newNode = Node(self.board)
newNode.makeMove('X', move[0], move[1])
M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)
if score> bestScore:
bestop = move[:]
bestScore = score
alpha = max(alpha, bestScore)
if beta <= alpha:
break
return bestop, bestScore
else: # 'O'的回合
bestop, bestScore = [-1,-1], 99999999
moves = node.getValidMoves('O')
random.shuffle(moves)
for move in moves:
newNode = Node(self.board)
newNode.makeMove('O', move[0], move[1])
M, score = self.alphabeta(newNode, depth-1, alpha,beta,True)
if score< bestScore:
bestop = move[:]
bestScore = score
beta = min(beta, bestScore)
if beta <= alpha:
break
return bestop, bestScore
# 給定盤面board,回傳電腦的選擇
def getComputerMove(self, computerTile):
root = Node(self.board)
isMaxPlayer = (computerTile == 'X')
move, score = self.alphabeta(root, 3, -999,999, isMaxPlayer)
return move
class Test(unittest.TestCase):
def test_basic(self):
game = TicTacToe()
game.makeMove('O',0,0)
game.makeMove('O',0,1)
game.makeMove('X',1,1)
game.makeMove('X',0,2)
ai = TicTacToeAI(game.board)
move = ai.getComputerMove('X')
ai.drawBoard()
self.assertEqual(move, [2,0])
def test_basic2(self):
game = TicTacToe()
game.makeMove('X',0,0)
game.makeMove('X',0,1)
game.makeMove('O',1,1)
game.makeMove('O',0,2)
ai = TicTacToeAI(game.board)
move = ai.getComputerMove('X')
ai.drawBoard()
self.assertEqual(move, [2,0])
def test_fixDatas(self):
game = TicTacToe()
game.makeMove('O',2,0)
game.makeMove('O',0,1)
game.makeMove('X',1,1)
game.makeMove('X',0,2)
ai = TicTacToeAI(game.board)
move = ai.getComputerMove('X')
ai.drawBoard()
self.assertIn(move, [[1,2],[2,2]])
unittest.main()
想提幾點我看到的部分進行討論
為尊重作者的程式碼內容,因此僅已作者有使用到的 package 來討論想法
(我知道可以用 numpy 或 dataframe 來解,但那不是目前的場域XD)
以下分為兩個部分 : 思考策略與模擬落子的獎勵計分
第一部分 :
思考策略看起來是偏向某一方連走,而非模擬交互下棋的規則
#檢查tile放在某個座標是否為合法棋步
def isValidMove(self, tile, x, y):
return self.isOnBoard(x, y) and self.board[x][y]==' '
isValidMove 用不到 tile 變數,依當前程式片段來看是要找出所有空格
# 回傳現在盤面輪到tile走的所有合法棋步
def getValidMoves(self, tile):
return [[x, y] for x in range(3) for y in range(3) if self.isValidMove(tile, x, y)]
同上, getValidMoves 的 input parameter tile 是為了提供給 isValidMove,但 isValidMove 實際上不需要知道 tile
moves = node.getValidMoves('X')
random.shuffle(moves)
for move in moves:
newNode = Node(self.board)
newNode.makeMove('X', move[0], move[1])
M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)
if score> bestScore:
bestop = move[:]
bestScore = score
alpha = max(alpha, bestScore)
if beta <= alpha:
break
return bestop, bestScore
拜訪所有合法的空格,直到走完隨機3步或中途有人勝利
然而,目前的程式邏輯中只能看到
# 這一行的 recursive 只考慮相同一方連走3步
# 但考量實際的規則情況下
# 應該是雙方交互總共下 3步 = 往後看 1 局結果
# 或者是雙方交互下棋各 3 步 = 往後看 3 局結果
M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)
而計算分數的 valuef 看起來是某一方贏才有對應獎勵,不然算平手
# 'X'贏的話得分
def valuef(self):
if winTile == 'X':
return 100
elif winTile == 'O':
return -100
return 0
因此可以考慮將
M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)
調整成
M, score = self.alphabeta(newNode, depth-1, alpha,beta, not isMaxPlayer)
模擬雙方交互下棋的結果
第二部分 :
TicTacToe 應該是零和賽局,最佳棋步應該要採納 greedy strategy 較為合適
實際上可以先定義出 8 條連線的 map
line_map = [
[(0,0), (0,1), (0,2)],
[(1,0), (1,1), (1,2)],
[(2,0), (2,1), (2,2)],
[(0,0), (1,0), (2,0)],
[(1,0), (1,1), (1,2)],
[(2,0), (2,1), (2,2)],
[(0,0), (1,1), (2,2)],
[(0,2), (1,1), (2,0)],
]
在模擬每一步棋落子在合法位置後,計算該次具備合法位置的所有連線得分,算分方式採用 abs(sum(LINE)),計算合法位置的所有連線位置總分,並取得分最高的連線中,落子的合法位置
套用前面提到的零和賽局,定義 X 落子位置得1分、O落子位置得-1分、空格得0分
因此當某方在任意一條連線上的加總分取絕對值 等於3分時,則表示該方獲勝
由於計分時可得知目前落子方是誰,因此計算得出的絕對值3分應屬於落子方,如果是另一方的絕對值3分,則應該在另一方模擬落子進行計分時就已取得絕對值3分
因此,board 的設計便能同時儲存 "X" "O" 與 "空格" 的分數,提供給模擬落子的計分使用
如果要計算交互多步落子,那應該可以透過連續的乘積來決定該回合的最佳棋步
非常感謝分享想法,我會針對您說的部分去做檢查,謝謝~
你好:
目前我已經找到程式碼最核心的錯誤了,
錯在alphabeta函數內的
newNode = Node(self.board)
newNode.makeMove('X', move[0], move[1])
應改成
newNode = Node(node.board)
newNode.makeMove('X', move[0], move[1])
以及
newNode = Node(self.board)
newNode.makeMove('O', move[0], move[1])
應改成
newNode = Node(node.board)
newNode.makeMove('O', move[0], move[1])
如果寫newNode = Node(self.board)
等於不管計算幾層還是用原來的盤面沒複製到
其實你給的建議也很好,但不是最主要的bug,
我的問題大致上解決了。
非常感謝您認真有心的回答,
就選你作最佳解答吧