蒙特卡羅算法在游戲（圍棋）AI中的應(yīng)用

字號：小 中 大

As many of my classmates have posted, the Monte Carlo method isn’t actually any single method, but actually represents an entire class of methods which involve taking random samples to find a result. An interesting application my partner and I found for the Monte Carlo method was for one of the GO AIs we made for one of our other projects. (GO is an ancient Chinese Board Game that is still very popular today in East Asia, the rules and details can be found here)
    像我很多同學(xué)說過的，蒙特卡羅算法不是一個算法，而是一系列關(guān)于通過隨機(jī)抽樣來求解的算法。我的 partner 和我發(fā)現(xiàn)了一個有趣的蒙特卡羅算法應(yīng)用：把它用在圍棋的人工智能上。（圍棋是一種來自中國的古老的智力游戲，直到今天在東亞仍然非常流行，參考這里）
    One of the reasons we chose to use the Monte Carlo method was because the immense number of possible moves in GO made using the Minimax Algorithm (one of the more common methods used for finding the next ”best” move in many game AIs like chess by consecutively maximizing and minimizing the score for a player up to a certain depth, more details here) far too computationally intensive when looking at more than 2 or 3 moves ahead (looking only 4 moves ahead on a mere 9×9 board takes about 81^4 > 4 million board evaluations). An interesting quote illustrating the computational intensity of GO games on a full 19×19 board is that “the number of possible GO games far exceeds the number of atoms in the universe” (more details and derivation here) Interesting Facts: Lower bound on number of possible GO games on 19×19 board is about 10^(10^48) . Upper bound is 10^(10^171).
    我們選擇蒙特卡羅算法的原因之一是圍棋中應(yīng)用極小極大算法（Minimax Algorithm，一種在棋類中常用的選擇“”的下一步著法的算法，參考這里）來計算2步或3步之后的著法產(chǎn)生的計算量就非常巨大（在9x9的棋盤上計算4步著法就需要做81^4（大于4百萬）次盤面估值）。有一句非常形象的話來形象圍棋（19x19）的計算復(fù)雜度：遠(yuǎn)大于宇宙中所有原子的個數(shù)（參考這里）。實際上圍棋（19x19）的計算下限的 10^(10^48)，上限是10^(10^171)。
    So another way we used to evaluate how “good” a move is was to use the Monte Carlo method. What the Monte Carlo method does in this case to estimate how good or bad a certain move is for a given board position is to play “virtual games” illustrating what would happen if two Random AIs (AIs playing completely randomly) played out those moves. The way it does this is to start from this board position and play each of the viable moves in a fixed number of games with all subsequent moves being completely random. Then after all of the ”virtual games” are finished, we would average the total scores of each game and let it represent the “goodness” of the original move which spawned that game. Finally by choosing the move with the highest average score, the Monte Carlo AI would then play this move in the actual game itself, based on the assumption that the moves which score better over a large number of random games would be “better” moves in general.
    因此我們使用蒙特卡羅算法來評估一個著法有多好（差）。蒙特卡羅算法評估某一著法有多好（差）的方法是由兩個隨機(jī)AI（選擇的著法完全隨機(jī)）對一個給定的盤面下若干盤“虛擬棋”。從一個給定的盤面開始，然后對每一可行著法計算指定數(shù)量的后續(xù)著法完全隨機(jī)的“虛擬棋”。之后，我們統(tǒng)計所有可行走法的平均值，以反映出“好”的著法。最后是選擇有著的平均值的著法，蒙特卡羅AI在真正的棋局中應(yīng)用這一著法。這是基于假設(shè)這一高分著法通常比其它的選擇產(chǎn)生的結(jié)局都要好來做的。
    For our project, we let our AI play about 500 virtual games for each move, which on slower computers actually can take a while, but it is still far faster than trying to use the Minimax Algorithm to look ahead just 4 moves (just over 1 million evaluations compared to 4 million +). In addition, the results of the Monte Carlo AI are pretty good as it can generally defeat most of our other AIs (Minimax AI looking 2 or 3 moves ahead and Random AIs), and it even put up a decent fight against some beginner human players as well.
    在我們的項目中，我們讓AI對每一個著法下500局“虛擬棋”。這也有不小的計算量，如果機(jī)器比較“破落”，可能需要計算挺長的一段時間。但它仍然比用極小極大算法向前計算4步（計算量大約是9x9棋盤計算4步（約需評估4百多萬個盤面，見前文）的1百萬倍）要快得多。蒙特卡羅AI 的效果很好，它通常能夠打敗極大極小算法AI（計算2或3步）和隨機(jī)AI，這樣的棋力跟初學(xué)圍棋的人類差不多。
    本文最初發(fā)表于賴勇浩（戀花蝶）的博客，http://blog.csdn.net/lanphaday，如蒙轉(zhuǎn)載，敬請保留全文完整，未經(jīng)許可，不得用以商業(yè)用途。
    Worth noting is that one very important factor for how well the Monte Carlo method works in this case is the scoring function which you use to decide a player’s score given a certain board position. The one we used which is very straightforward and relatively simple in that it just assigns an empty spot to whoever has the closest stones to that spot, with ties being broken by number of stones near it. This isn’t the most accurate or effective scoring method, but it worked decently well enough for our purposes.
    值得注意的是蒙特卡羅算法依賴于一個很重要的因素，那就是對特定盤面的估值函數(shù)。我們用了一個簡單的函數(shù)：把空的點歸屬于最近的棋子，如果有多個棋子，則平分。它可能不夠準(zhǔn)確和高效，但對于我們來說，已經(jīng)足夠。
    The AI we developped using Monte Carlo methods was one of the better AIs we made, but it is still nowhere near the capabilities of a decently experienced amateur human player. Especially, the AI starts losing out near the end game when tactics mean a lot more than overall strategy (which Monte Carlo and Minimax seem to do well at). And the fact that we are using random moves to play each “virtual game” means that we can get very different results each time we play it, especially near the end game where results of moves really depend on the quality of subsequent moves, which in this case are completely random.
    我們開發(fā)的蒙特卡羅算法AI是我們開始的AI中較好的一個，但它與訓(xùn)練有素的棋手仍然相距甚遠(yuǎn)。尤其在游戲?qū)⒔Y(jié)束時，戰(zhàn)術(shù)比策略顯得更為重要，AI 就容易輸棋（蒙特卡羅算法和極小極大算法都有這種問題）。我們使用隨機(jī)著法來下每一個局“虛擬棋”，所以我們每一次都會得到不同的結(jié)果。在將近結(jié)局的時候，最后的結(jié)果依賴于后續(xù)著法的質(zhì)量，而在這里后續(xù)著法是完全隨機(jī)的，所以效果差強(qiáng)人意。
    GO is considered by many to be the most complicated game we know of to date, and it is very unlikely that we will be able to come even marginally close to solving the game anytime soon (want to even try writing out 10^(10^48)?). But it seems equally unlikely that people will give up on trying anytime soon either, as has been proven by human tenacity in the face of other “insurmountable” odds in the past (landing on the Moon…).
    圍棋被認(rèn)為是目前為止最復(fù)雜的游戲，而且我們不可能在很近的將來解決它。但大家都不會放棄，因為已經(jīng)證明人類在面對“不可逾越”的問題上是堅忍不拔的（例如登月）。
    NOTE: when I said “random” in this post, I naturally mean the pseudorandom number generators computers use, which isn’t really random, but was more than close enough for our project.
    注意：本文中的“隨機(jī)”是指計算機(jī)使用的偽隨機(jī)數(shù)，而非真隨機(jī)，但從項目中來看已經(jīng)不錯了。

蒙特卡羅算法在游戲（圍棋）AI中的應(yīng)用

字號： 小 中 大

字號：小中大