java - TicTacToe minimax 算法在 4x4 游戏中返回意外结果

coder 2024-04-01 原文

在我的方法 newminimax499 中，我有一个利用内存和 alpha beta 修剪的 minimax 算法。该方法通常适用于 3x3 游戏，但是当我玩 4x4 游戏时，我会得到奇怪的、意想不到的计算机位置选择。他仍然从不输，但他似乎并不是为了赢而比赛。为了说明这里的问题，我们使用 2 个 3x3 和 4x4 游戏的场景。首先是一个 3x3 游戏的场景，其中玩家是 X 并迈出第一步:

这还不错，事实上，这正是人们希望计算机执行的操作。现在来看一个 4x4 游戏的场景。同样，O 是计算机，X 启动:

如您所见，计算机只是将 Os 一个接一个地按系统顺序排列，只有在它有可能获胜时才打破该顺序来阻止 X。这是非常防守的打法，不像在 3x3 比赛中看到的那样。那么为什么该方法对于 3x3 和 4x4 表现不同？

代码如下:

//This method returns a 2 element int array containing the position of the best possible 
//next move and the score it yields. Utilizes memoization and  alpha beta 
//pruning to achieve better performance. 
public int[] newminimax499(int a, int b){
    //int bestScore = (turn == 'O') ? +9 : -9;  //X is minimizer, O is maximizer
    int bestPos=-1;
    int alpha= a;
    int beta= b;
    int currentScore;
    //boardShow();
    String stateString = "";                                                
    for (int i=0; i<state.length; i++) 
        stateString += state[i];                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) 
        return oldAnswer;
    if(isGameOver()!='N'){
        int[] answer = {score(), bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='X'){  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                revert(x);
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
                if(alpha>=beta){
                    break;
                }
            }
            else {  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                revert(x);
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
                if(alpha>=beta){
                    break;
                }
            }
        }
    }
    if(turn=='X'){
        int[] answer = {beta, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else { 
        int[] answer = {alpha, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
}

以下是你们运行代码所需的其他组件和补充方法。我的类 State2 中使用的字段和构造函数:

private char [] state;  //Actual content of the board
private char turn;  //Whose turn it is
private Map<String,int[]> oldAnswers; //Used for memoization. It saves every state along with the score it yielded which allows us to stop exploring the children of a certain node if a similar node's score has been previously calculated. The key is the board state(i.e OX------X for example), the int array is a 2 element array containing the score and position of last placed seed of the state.  
private Map<Integer, int []> RowCol; //A mapping of positions from a board represented as a normal array to a board represented as a 2d array. For example: The position 0 maps to 0,0 on a 2d array board, 1 maps to 0,1 and so on.
private static int n;   //Size of the board
private static int stateID; //An simple incrementer used to show number of recursive calls in the newminiax49 method. 
private static int countX, countO; //Number of placed Xs and Os
private static int lastAdded; //Position of last placed seed
private char [][] DDState; //A 2d array representing the board. Contains the same values as state[]. Used for simplicity in functions that check the state of the board.

public State2(int n){
    int a=0;
    State2.n=n;
    state=new char[n*n];
    RowCol=new HashMap<Integer, int []>();
    countX=0;
    countO=0;
    //Initializing the board with empty slots
    for(int i = 0; i<state.length; i++){
        state[i]='-';
    }
    //Mapping
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            RowCol.put(a, new int[]{i, j});
            a++;
        }
    }
    a=0;
    DDState=new char[n][n];
    //Initializing the 2d array with the values from state[](empty slots)
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            DDState[i][j]=state[a];
            a++;
        }
    }
    oldAnswers = new HashMap<String,int[]>();
}

补充方法:

getAvailableMoves，返回一个数组，其中包含棋盘上的空槽(即可能的下一步)。

public int[] getAvailableMoves(){
    int count=0;
    int i=0;
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            count++;
    }
    int [] availableSlots = new int[count];
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            availableSlots[i++]=j;      
    }
    return availableSlots;
}

isGameOver2()，简单地检查棋盘的当前状态以判断游戏是否结束。返回一个字符 'X'、'O'、'D' 和 'N'，分别代表 X 获胜、O 获胜、平局和未游戏结束。

public char isGameOver2(){
    char turnOpp;
    int count;
    if(turn=='X'){
        count=countO;
        turnOpp='O';
    }
    else {
        count=countX;
        turnOpp='X';
    }
    if(count>=n){ 
        for(int i=0; i<n; i++){
            if(DDState[i][RowCol.get(lastAdded)[1]]!=turnOpp)
                break;
            if(i==(n-1)){
                return turnOpp;
            }
        }

        //Check row for win
        for(int i=0; i<n; i++){
            if(DDState[RowCol.get(lastAdded)[0]][i]!=turnOpp)
                break;
            if(i==(n-1)){
                return turnOpp;
            }
        }

        //Check diagonal for win
        if(RowCol.get(lastAdded)[0] == RowCol.get(lastAdded)[1]){

            //we're on a diagonal
            for(int i = 0; i < n; i++){
                if(DDState[i][i] != turnOpp)
                    break;
                if(i == n-1){
                    return turnOpp;
                }
            }
        }

        //check anti diagonal 

        for(int i = 0; i<n; i++){
            if(DDState[i][(n-1)-i] != turnOpp)
                break;
            if(i == n-1){
                return turnOpp;
            }
        }

        //check for draw
        if((countX+countO)==(n*n))
            return 'D';

            }
    return 'N';
}

boardShow，返回棋盘当前状态的矩阵显示:

public void boardShow(){
    if(n==3){
        System.out.println(stateID);
        for(int i=0; i<=6;i+=3)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]");
        System.out.println("***********");
    }
    else {
        System.out.println(stateID);
        for(int i=0; i<=12;i+=4)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]"+" ["+state[i+3]+"]");
        System.out.println("***********");
    }   
}

score，是一个简单的评估函数，O 获胜时返回 +10，X 获胜时返回 -10，平局返回 0:

public int score(){
    if(isGameOver2()=='X')
        return -10;
    else if(isGameOver2()=='O')
        return +10;
    else 
        return 0;
}

播种机:

//Sets an X at a certain location and updates the turn, countX and lastAdded variables
public void setX(int i){
    state[i]='X';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='X';
    turn='O';
    countX++;
    lastAdded=i;
}

//Sets an O at a certain location and updates the turn, countO and lastAdded variables
public void setO(int i){
    state[i]='O';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='O';
    turn='X';
    countO++;
    lastAdded=i;
}

还原，简单地还原移动。例如，如果一个 X 被放置在位置 0 revert(0) 设置一个 '-' 在它的位置并更新由 setX 更改的变量:

public void revert(int i){
    state[i]='-';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='-';
    if(turn=='X'){
        turn = 'O';
        countO--;
    }
    else {
        turn = 'X';
        countX--;
    }
}

main 方法可能是这样的:

public static void main(String[] args) {
    State2 s=new State2(4);
    int [] results=new int[2];
    s.setX(0);
    long startTime = System.currentTimeMillis();
    results=s.newminimax499(Integer.MIN_VALUE,Integer.MAX_VALUE);
    long endTime = System.currentTimeMillis();
    System.out.println("Score: "+results[0]+" Position: "+ results[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();

}

最佳答案

我不相信这里有一个错误——如果 O 在较早的位置之一，它会 fork ，而如果它在中间，它会强制平局。大概 4x4 游戏更难赢/输，所以计算机无动于衷地选择了第一个开放的方格。

下面，1表示O的强制响应，2表示X的 fork 走法，?表示可能的获胜位置。

X|O|
-+-+-
2|X|?
-+-+-
?| |1

X| |O
-+-+-
X|2|?
-+-+-
1| |?

X|2|?
-+-+-
O|X|
-+-+-
 |?|1

关于java - TicTacToe minimax 算法在 4x4 游戏中返回意外结果，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32119093/

有关java - TicTacToe minimax 算法在 4x4 游戏中返回意外结果的更多相关文章

ruby - 为什么 4.1%2 使用 Ruby 返回 0.0999999999999996？但是 4.2%2==0.2 - 2
为什么4.1%2返回0.0999999999999996？但是4.2%2==0.2。最佳答案参见此处:WhatEveryProgrammerShouldKnowAboutFloating-PointArithmetic实数是无限的。计算机使用的位数有限(今天是32位、64位)。因此计算机进行的浮点运算不能代表所有的实数。0.1是这些数字之一。请注意，这不是与Ruby相关的问题，而是与所有编程语言相关的问题，因为它来自计算机表示实数的方式。关于ruby-为什么4.1%2使用Ruby返
ruby - 检查 "command"的输出应该包含 NilClass 的意外崩溃 - 2
为了将Cucumber用于命令行脚本，我按照提供的说明安装了arubagem。它在我的Gemfile中，我可以验证是否安装了正确的版本并且我已经包含了require'aruba/cucumber'在'features/env.rb'中为了确保它能正常工作，我写了以下场景:@announceScenario:Testingcucumber/arubaGivenablankslateThentheoutputfrom"ls-la"shouldcontain"drw"假设事情应该失败。它确实失败了，但失败的原因是错误的:@announceScenario:Testingcucumber/ar
java - 等价于 Java 中的 Ruby Hash - 2
我真的很习惯使用Ruby编写以下代码:my_hash={}my_hash['test']=1Java中对应的数据结构是什么？最佳答案 HashMapmap=newHashMap();map.put("test",1);我假设？关于java-等价于Java中的RubyHash，我们在StackOverflow上找到一个类似的问题： https://stackoverflow.com/questions/22737685/
ruby - 检查字符串是否包含散列中的任何键并返回它包含的键的值 - 2
我有一个包含多个键的散列和一个字符串，该字符串不包含散列中的任何键或包含一个键。h={"k1"=>"v1","k2"=>"v2","k3"=>"v3"}s="thisisanexamplestringthatmightoccurwithakeysomewhereinthestringk1(withspecialcharacterslike(^&*$#@!^&&*))"检查s是否包含h中的任何键的最佳方法是什么，如果包含，则返回它包含的键的值？例如，对于上面的h和s的例子，输出应该是v1。编辑:只有字符串是用户定义的。哈希将始终相同。最佳答案
ruby - Ruby 中的隐式返回值是怎么回事？ - 2
所以我开始关注ruby，很多东西看起来不错，但我对隐式return语句很反感。我理解默认情况下让所有内容返回self或nil但不是语句的最后一个值。对我来说，它看起来非常脆弱(尤其是)如果你正在使用一个不打算返回某些东西的方法(尤其是一个改变状态/破坏性方法的函数!)，其他人可能最终依赖于一个返回对方法的目的并不重要，并且有很大的改变机会。隐式返回有什么意义？有没有办法让事情变得更简单？总是有返回以防止隐含返回被认为是好的做法吗？我是不是太担心这个了？附言当人们想要从方法中返回特定的东西时，他们是否经常使用隐式返回，这不是让你组中的其他人更容易破坏彼此的代码吗？当然，记录一切并给出
java - 从 JRuby 调用 Java 类的问题 - 2
我正在尝试使用boilerpipe来自JRuby。我看过guide从JRuby调用Java，并成功地将它与另一个Java包一起使用，但无法弄清楚为什么同样的东西不能用于boilerpipe。我正在尝试基本上从JRuby中执行与此Java等效的操作:URLurl=newURL("http://www.example.com/some-location/index.html");Stringtext=ArticleExtractor.INSTANCE.getText(url);在JRuby中试过这个:require'java'url=java.net.URL.new("http://www
java - 我的模型类或其他类中应该有逻辑吗 - 2
我只想对我一直在思考的这个问题有其他意见，例如我有classuser_controller和classuserclassUserattr_accessor:name,:usernameendclassUserController//dosomethingaboutanythingaboutusersend问题是我的User类中是否应该有逻辑user=User.newuser.do_something(user1)oritshouldbeuser_controller=UserController.newuser_controller.do_something(user1,user2)我
ruby-on-rails - ruby 日期方程不返回预期的真值 - 2
为什么以下不同？Time.now.end_of_day==Time.now.end_of_day-0.days#falseTime.now.end_of_day.to_s==Time.now.end_of_day-0.days.to_s#true 最佳答案因为纳秒数不同:ruby-1.9.2-p180:014>(Time.now.end_of_day-0.days).nsec=>999999000ruby-1.9.2-p180:015>Time.now.end_of_day.nsec=>999999998
ruby - 从 String#split 返回的零长度字符串 - 2
在Ruby1.9.3(可能还有更早的版本，不确定)中，我试图弄清楚为什么Ruby的String#split方法会给我某些结果。我得到的结果似乎与我的预期相反。这是一个例子:"abcabc".split("b")#=>["a","ca","c"]"abcabc".split("a")#=>["","bc","bc"]"abcabc".split("c")#=>["ab","ab"]在这里，第一个示例返回的正是我所期望的。但在第二个示例中，我很困惑为什么#split返回零长度字符串作为返回数组的第一个值。这是什么原因呢？这是我所期望的:"abcabc".split("a")#=>["bc"
java - 什么相当于 ruby 的 rack 或 python 的 Java wsgi？ - 2
什么是ruby的rack或python的Java的wsgi？还有一个路由库。最佳答案来自Python标准PEP333:Bycontrast,althoughJavahasjustasmanywebapplicationframeworksavailable,Java's"servlet"APImakesitpossibleforapplicationswrittenwithanyJavawebapplicationframeworktoruninanywebserverthatsupportstheservletAPI.ht

java - TicTacToe minimax 算法在 4x4 游戏中返回意外结果

有关java - TicTacToe minimax 算法在 4x4 游戏中返回意外结果的更多相关文章

随机推荐