用python程式解井字遊戲的盤面，先能窮舉三步棋就好

python game

不明 2020-07-14 22:07:18 ‧ 5677 瀏覽

分享至

我在嘗試用程式解井字遊戲的盤面，
首先，我寫了一個基本架構:

# 寫井字遊戲的基本邏輯，棋子共'X','O'兩種
class TicTacToe():
    def __init__(self):
        self.board = [[' ']*3 for i in range(3)]
    
    # 初始化棋盤
    def iniBoard(self):
        for i in range(3):
            for j in range(3):
                self.board[i][j]=' '
        
    # 視覺化的把井字棋棋盤畫出來    
    def drawBoard(self) -> None:
        HLINE =  ' ' * 3 + '+---' * 3  + '+'
        VLINE = (' ' * 3 +'|') *  (3 +1)
        title = '     0'
        for i in range(1,3):
            title += ' ' * 3 +str(i)
        print(title)
        print(HLINE)
        for y in range(3):
            print(VLINE)
            print(y, end='  ')
            for x in range(3):
                print(f'| {self.board[x][y]}', end=' ')
            print('|')
            print(VLINE)
            print(HLINE)
    
    def isOnBoard(self, x, y):
        return 0 <= x < 3 and 0 <= y < 3

    #檢查tile放在某個座標是否為合法棋步
    def isValidMove(self, tile, x, y):
        return self.isOnBoard(x, y) and self.board[x][y]==' '

    # 把棋子下在座標x, y的地方
    def makeMove(self, tile, x, y):
        if self.isValidMove(tile, x, y):
            self.board[x][y]=tile
    
    # 回傳現在盤面輪到tile走的所有合法棋步
    def getValidMoves(self, tile):
        return [[x, y] for x in range(3) for y in range(3) if self.isValidMove(tile, x, y)]

    # 判斷一個盤面是否有人贏了
    def check_TicTacToe(self):
        rows = list(map(''.join,self.board))
        cols = list(map(''.join, zip(*rows)))
        diags = list(map(''.join, zip(*[(r[i], r[2 - i]) for i, r in enumerate(rows)])))
        lines = rows + cols + diags

        if 'XXX' in lines:
            return 'X'  
        if 'OOO' in lines:
            return 'O' 
        return 'D' # draw(和棋)

我陣列裡面的棋子共有'O'和X兩種，
用字串'O', X, ' '代表格子內的棋子是'O', X還是空白，
棋盤的座標是先看x座標再看y座標，
圖示(比如說00是左上角，20是右上角):

     0   1   2
   +---+---+---+
   |   |   |   |
0  |   |   |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
1  |   |   |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
2  |   |   |   |
   |   |   |   |
   +---+---+---+

電腦下棋的邏輯

這部分我是想要實作min-max演算法或alpha-beta演算法來解井字遊戲的盤面。
參考維基百科: 極小化極大演算法，Alpha-beta剪枝

這部分我已經debug一整天，實在看不出哪裡有誤，
要把對局樹印出來又有點複雜，
只好來此求助。

我的電腦下棋邏輯如下，
其中class Node是用來記錄一個井字棋的盤面，
TicTacToeAI實作alpha-beta演算法，
我已經儘量跟維基上的程式碼一致了，
不一樣的地方在於維基上的只有算最佳分數，
我的多回傳最佳棋步(畢竟讓電腦下棋總要知道下在哪裡，不能只知道這個盤面會贏還是會輸吧?)


import random
from copy import deepcopy

class Node(TicTacToe):
    def __init__(self, board):
        super().__init__()
        self.board = deepcopy(board)

    def ending(self):
        return self.check_TicTacToe()!='D'
 
    # 'X'贏的話得分
    def valuef(self):
        winTile = self.check_TicTacToe()
        if winTile == 'X':
            return 100
        elif winTile == 'O':
            return -100
        return 0

# 電腦ai下棋的邏輯
class TicTacToeAI(TicTacToe):
    def __init__(self, board):
        super().__init__()
        self.board = board
        
    """
    參數: node代表盤面，depth是窮舉幾步棋，isMaxPlayer為True時換'X'走
    alpha, beta是alpha-beta演算法的參數，如果看不懂，
    理論上你把我函數的
    if beta <= alpha:
        break
    註解掉即是min-max演算法，
    代表說窮舉所有棋步不剪枝
    """
    def alphabeta(self, node, depth, alpha, beta, isMaxPlayer):
        if depth==0 or node.ending():
            return [-1,-1], node.valuef()

        if isMaxPlayer: # 'X'的回合
            bestop, bestScore = [-1,-1], -99999999
            moves = node.getValidMoves('X')
            random.shuffle(moves)
            for move in moves:

                newNode = Node(self.board)
                newNode.makeMove('X', move[0], move[1])

                M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)

                if score> bestScore:
                    bestop = move[:]
                    bestScore = score
                
                alpha = max(alpha, bestScore)
                if beta <= alpha:
                    break
            return bestop, bestScore
        
        else: # 'O'的回合
            bestop, bestScore = [-1,-1], 99999999
            moves = node.getValidMoves('O')
            random.shuffle(moves)
            for move in moves:

                newNode = Node(self.board)
                newNode.makeMove('O', move[0], move[1])

                M, score = self.alphabeta(newNode, depth-1, alpha,beta,True)

                if score< bestScore:
                    bestop = move[:]
                    bestScore = score
                
                beta = min(beta, bestScore)
                if beta <= alpha:
                    break
            return bestop, bestScore

    # 給定盤面board，回傳電腦的選擇
    def getComputerMove(self, computerTile):
        root = Node(self.board)
        isMaxPlayer = (computerTile == 'X')
        move, score = self.alphabeta(root, 3, -999,999, isMaxPlayer)
        return move

三個測試的盤面

這是我用來測試程式正確性的三個盤面(假設都換X下):
第一個盤面

     0   1   2
   +---+---+---+
   |   |   |   |
0  | O |   |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
1  | O | X |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
2  | X |   |   |
   |   |   |   |
   +---+---+---+

這邊很明顯，X只要再一步下在座標2,0就贏了，
電腦應回傳[2,0]過關

第二個盤面也很簡單，就是第一個盤面O,X互換

     0   1   2
   +---+---+---+
   |   |   |   |
0  | X |   |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
1  | X | O |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
2  | O |   |   |
   |   |   |   |
   +---+---+---+

很明顯，X必須下在座標2,0阻止O連線了，
電腦應回傳[2,0]過關

第三個盤面需要計算三步，
但對人來說也很簡單，
X下在座標1,2或2,2皆可必勝

     0   1   2
   +---+---+---+
   |   |   |   |
0  |   |   | O |
   |   |   |   |
   +---+---+---+
   |   |   |   |
1  | O | X |   |
   |   |   |   |
   +---+---+---+
   |   |   |   |
2  | X |   |   |
   |   |   |   |
   +---+---+---+

測試盤面的程式我用python的unitest模組來做

import unittest

class Test(unittest.TestCase):

    def test_basic(self):
        game = TicTacToe()
        game.makeMove('O',0,0)
        game.makeMove('O',0,1)
        game.makeMove('X',1,1)
        game.makeMove('X',0,2)
        ai = TicTacToeAI(game.board)
        move = ai.getComputerMove('X')
        ai.drawBoard()
        self.assertEqual(move, [2,0])

    def test_basic2(self):
        game = TicTacToe()
        game.makeMove('X',0,0)
        game.makeMove('X',0,1)
        game.makeMove('O',1,1)
        game.makeMove('O',0,2)
        ai = TicTacToeAI(game.board)
        move = ai.getComputerMove('X')
        ai.drawBoard()
        self.assertEqual(move, [2,0]) 

    def test_fixDatas(self):
        game = TicTacToe()
        game.makeMove('O',2,0)
        game.makeMove('O',0,1)
        game.makeMove('X',1,1)
        game.makeMove('X',0,2)
        ai = TicTacToeAI(game.board)
        move = ai.getComputerMove('X')
        ai.drawBoard()
        self.assertIn(move, [[1,2],[2,2]]) 

unittest.main()

完整測試的程式碼

我的測試環境為線上程式編譯器repl.it，
unitest的class當然不能改，
如果執行程式沒有錯誤，就算過關

註: 目前是有時會過關，有時會錯，
因為我有用random打亂電腦算棋的順序，
但理論上不論算棋的順序，
最好的一步應該是一樣的

import random
import unittest
from copy import deepcopy

# 寫井字遊戲的基本邏輯，棋子共'X','O'兩種
class TicTacToe():
    def __init__(self):
        self.board = [[' ']*3 for i in range(3)]
    
    # 初始化棋盤
    def iniBoard(self):
        for i in range(3):
            for j in range(3):
                self.board[i][j]=' '
        
    # 視覺化的把井字棋棋盤畫出來    
    def drawBoard(self) -> None:
        HLINE =  ' ' * 3 + '+---' * 3  + '+'
        VLINE = (' ' * 3 +'|') *  (3 +1)
        title = '     0'
        for i in range(1,3):
            title += ' ' * 3 +str(i)
        print(title)
        print(HLINE)
        for y in range(3):
            print(VLINE)
            print(y, end='  ')
            for x in range(3):
                print(f'| {self.board[x][y]}', end=' ')
            print('|')
            print(VLINE)
            print(HLINE)
    
    def isOnBoard(self, x, y):
        return 0 <= x < 3 and 0 <= y < 3

    #檢查tile放在某個座標是否為合法棋步
    def isValidMove(self, tile, x, y):
        return self.isOnBoard(x, y) and self.board[x][y]==' '

    # 把棋子下在座標x, y的地方
    def makeMove(self, tile, x, y):
        if self.isValidMove(tile, x, y):
            self.board[x][y]=tile
    
    # 回傳現在盤面輪到tile走的所有合法棋步
    def getValidMoves(self, tile):
        return [[x, y] for x in range(3) for y in range(3) if self.isValidMove(tile, x, y)]

    # 判斷一個盤面是否有人贏了
    def check_TicTacToe(self):
        rows = list(map(''.join,self.board))
        cols = list(map(''.join, zip(*rows)))
        diags = list(map(''.join, zip(*[(r[i], r[2 - i]) for i, r in enumerate(rows)])))
        lines = rows + cols + diags

        if 'XXX' in lines:
            return 'X'  
        if 'OOO' in lines:
            return 'O' 
        return 'D' # draw(和棋)
        

class Node(TicTacToe):
    def __init__(self, board):
        super().__init__()
        self.board = deepcopy(board)

    def ending(self):
        return self.check_TicTacToe()!='D'
 
    # 'X'贏的話得分
    def valuef(self):
        winTile = self.check_TicTacToe()
        if winTile == 'X':
            return 100
        elif winTile == 'O':
            return -100
        return 0

# 電腦ai下棋的邏輯
class TicTacToeAI(TicTacToe):
    def __init__(self, board):
        super().__init__()
        self.board = board

    def alphabeta(self, node, depth, alpha, beta, isMaxPlayer):
        if depth==0 or node.ending():
            return [-1,-1], node.valuef()

        if isMaxPlayer: # 'X'的回合
            bestop, bestScore = [-1,-1], -99999999
            moves = node.getValidMoves('X')
            random.shuffle(moves)
            for move in moves:

                newNode = Node(self.board)
                newNode.makeMove('X', move[0], move[1])

                M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)

                if score> bestScore:
                    bestop = move[:]
                    bestScore = score
                
                alpha = max(alpha, bestScore)
                if beta <= alpha:
                    break
            return bestop, bestScore
        
        else: # 'O'的回合
            bestop, bestScore = [-1,-1], 99999999
            moves = node.getValidMoves('O')
            random.shuffle(moves)
            for move in moves:

                newNode = Node(self.board)
                newNode.makeMove('O', move[0], move[1])

                M, score = self.alphabeta(newNode, depth-1, alpha,beta,True)

                if score< bestScore:
                    bestop = move[:]
                    bestScore = score
                
                beta = min(beta, bestScore)
                if beta <= alpha:
                    break
            return bestop, bestScore

    # 給定盤面board，回傳電腦的選擇
    def getComputerMove(self, computerTile):
        root = Node(self.board)
        isMaxPlayer = (computerTile == 'X')
        move, score = self.alphabeta(root, 3, -999,999, isMaxPlayer)
        return move


class Test(unittest.TestCase):

    def test_basic(self):
        game = TicTacToe()
        game.makeMove('O',0,0)
        game.makeMove('O',0,1)
        game.makeMove('X',1,1)
        game.makeMove('X',0,2)
        ai = TicTacToeAI(game.board)
        move = ai.getComputerMove('X')
        ai.drawBoard()
        self.assertEqual(move, [2,0])

    def test_basic2(self):
        game = TicTacToe()
        game.makeMove('X',0,0)
        game.makeMove('X',0,1)
        game.makeMove('O',1,1)
        game.makeMove('O',0,2)
        ai = TicTacToeAI(game.board)
        move = ai.getComputerMove('X')
        ai.drawBoard()
        self.assertEqual(move, [2,0]) 

    def test_fixDatas(self):
        game = TicTacToe()
        game.makeMove('O',2,0)
        game.makeMove('O',0,1)
        game.makeMove('X',1,1)
        game.makeMove('X',0,2)
        ai = TicTacToeAI(game.board)
        move = ai.getComputerMove('X')
        ai.drawBoard()
        self.assertIn(move, [[1,2],[2,2]]) 

unittest.main()

登入發表討論

直播研討會

{{ item.channelVendor }} {{ item.webinarstarted }} |

直播中

1 個回答

zhweiliu

iT邦新手 5 級 ‧ 2020-07-15 00:25:00

最佳解答

想提幾點我看到的部分進行討論
為尊重作者的程式碼內容，因此僅已作者有使用到的 package 來討論想法

(我知道可以用 numpy 或 dataframe 來解，但那不是目前的場域XD)

以下分為兩個部分 : 思考策略與模擬落子的獎勵計分

第一部分 :

思考策略看起來是偏向某一方連走，而非模擬交互下棋的規則

#檢查tile放在某個座標是否為合法棋步
def isValidMove(self, tile, x, y):
    return self.isOnBoard(x, y) and self.board[x][y]==' '

isValidMove 用不到 tile 變數，依當前程式片段來看是要找出所有空格

# 回傳現在盤面輪到tile走的所有合法棋步
def getValidMoves(self, tile):
    return [[x, y] for x in range(3) for y in range(3) if self.isValidMove(tile, x, y)]

同上， getValidMoves 的 input parameter tile 是為了提供給 isValidMove，但 isValidMove 實際上不需要知道 tile

moves = node.getValidMoves('X')
random.shuffle(moves)
for move in moves:

    newNode = Node(self.board)
    newNode.makeMove('X', move[0], move[1])

    M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)

    if score> bestScore:
        bestop = move[:]
        bestScore = score

    alpha = max(alpha, bestScore)
    if beta <= alpha:
        break
return bestop, bestScore

拜訪所有合法的空格，直到走完隨機3步或中途有人勝利

然而，目前的程式邏輯中只能看到

# 這一行的 recursive 只考慮相同一方連走3步
# 但考量實際的規則情況下
# 應該是雙方交互總共下 3步 = 往後看 1 局結果
# 或者是雙方交互下棋各 3 步 = 往後看 3 局結果
M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)

而計算分數的 valuef 看起來是某一方贏才有對應獎勵，不然算平手

# 'X'贏的話得分
def valuef(self):
if winTile == 'X':
    return 100
elif winTile == 'O':
    return -100
return 0

因此可以考慮將

M, score = self.alphabeta(newNode, depth-1, alpha,beta,False)

調整成

M, score = self.alphabeta(newNode, depth-1, alpha,beta, not isMaxPlayer)

模擬雙方交互下棋的結果

第二部分 :
TicTacToe 應該是零和賽局，最佳棋步應該要採納 greedy strategy 較為合適
實際上可以先定義出 8 條連線的 map

line_map = [
    [(0,0), (0,1), (0,2)],
    [(1,0), (1,1), (1,2)],
    [(2,0), (2,1), (2,2)],
    [(0,0), (1,0), (2,0)],
    [(1,0), (1,1), (1,2)],
    [(2,0), (2,1), (2,2)],
    [(0,0), (1,1), (2,2)],
    [(0,2), (1,1), (2,0)],
]

在模擬每一步棋落子在合法位置後，計算該次具備合法位置的所有連線得分，算分方式採用 abs(sum(LINE))，計算合法位置的所有連線位置總分，並取得分最高的連線中，落子的合法位置

套用前面提到的零和賽局，定義 X 落子位置得1分、O落子位置得-1分、空格得0分
因此當某方在任意一條連線上的加總分取絕對值等於3分時，則表示該方獲勝
由於計分時可得知目前落子方是誰，因此計算得出的絕對值3分應屬於落子方，如果是另一方的絕對值3分，則應該在另一方模擬落子進行計分時就已取得絕對值3分

因此，board 的設計便能同時儲存 "X" "O" 與 "空格" 的分數，提供給模擬落子的計分使用

如果要計算交互多步落子，那應該可以透過連續的乘積來決定該回合的最佳棋步

回應 2
分享
檢舉

不明檢舉

非常感謝分享想法，我會針對您說的部分去做檢查，謝謝~

不明檢舉

你好:
目前我已經找到程式碼最核心的錯誤了，
錯在alphabeta函數內的

newNode = Node(self.board)
newNode.makeMove('X', move[0], move[1])

應改成

newNode = Node(node.board)
newNode.makeMove('X', move[0], move[1])

以及

newNode = Node(self.board)
newNode.makeMove('O', move[0], move[1])

應改成

newNode = Node(node.board)
newNode.makeMove('O', move[0], move[1])

如果寫newNode = Node(self.board)等於不管計算幾層還是用原來的盤面沒複製到

其實你給的建議也很好，但不是最主要的bug，

思考策略看起來是偏向某一方連走，而非模擬交互下棋的規則。其實我的是有交互下棋的，因為我用if-else控制，當isMaxPlayer為True時改成False，isMaxPlayer為False時改成True
你說最佳棋步改用計算「連線總分最高」，但由於井字盤面的對局樹夠小，而且這幾個測試盤面三步內可解，因此此例我想直接計算到底是沒問題的。

我的問題大致上解決了。
非常感謝您認真有心的回答，
就選你作最佳解答吧

登入發表回應