Adversarial Search CMPT 420 / CMPG 720 Outline Game playing Game trees o Minimax o Alpha-beta pruning Games vs.

search problems competitive environments: agents goals are in conflict adversarial search problems (games) Types of Games deterministic perfect information

chess, checkers, go, othello chance backgammon, monopoly bridge, poker imperfect information

Games deterministic, fully-observable, turntaking, twoplayer, zero-sum games o Utility values at the end are equal and opposite o Tic-tac-toe Formulation Two players MAX and MIN take turns (with MAX playing first)

S0: Player(s): Action(s): Result(s,a): Terminal-test(s):

Utility(s,p): Formulation S0: initial state

Player(s): Action(s): Result(s,a): Terminal-test(s): Utility(s,p): Formulation

S0: initial state Player(s): which player has the move in a state Action(s): Result(s,a): Terminal-test(s): Utility(s,p): Formulation

S0: initial state Player(s): which player has the move in a state Action(s): set of legal moves in a state Result(s,a):

Terminal-test(s): Utility(s,p): Formulation

S0: initial state Player(s): which player has the move in a state Action(s): set of legal moves in a state Result(s,a): transition model Terminal-test(s): Utility(s,p): Formulation

S0: initial state Player(s): which player has the move in a state Action(s): set of legal moves in a state Result(s,a): transition model Terminal-test(s): true/false (terminal states) Utility(s,p):

Formulation S0: initial state Player(s): which player has the move in a state Actions(s): set of legal moves in a state

Result(s,a): transition model Terminal-test(s): true/false (terminal states) Utility(s,p): utility function defines the final value of a game that ends in terminal state s for a player p o zero-sum games: same total payoff Game tree (1-player) Partial Game Tree for Tic-Tac-Toe

Optimal strategies MAX uses search tree to determine next move. Assumption: Both players play optimally!! Given a game tree, the optimal strategy can be determined by using the minimax value of each node Minimax The minimax value of a node is the utility

(for Max) of being in the corresponding state, assuming that both players play optimally. Minimax(s) = o o o if Terminal-test(s) if Player(s) = Max if Player(s) = Min

Minimax The minimax value of a node is the utility (for Max) of being in the corresponding state, assuming that both players play optimally. Minimax(s) = o Utility (s) if Terminal-test(s) o max of Minimax(Result(s,a)) if Player(s) = Max o min of Minimax(Result(s,a)) if Player(s) = Min

Optimal Play 2 2 7 1 8

2 2 7 1 8

1 2 7 1 8 2

This is the optimal play 2 1 MAX 2 7 1

8 MIN 2 7 1 8

Two-Ply Game Tree Two-Ply Game Tree Two-Ply Game Tree Two-Ply Game Tree The minimax decision Minimax maximizes the worst-case outcome for max.

optimally? Definition of optimal play for MAX assumes MIN plays optimally: maximizes worst-case outcome for MAX. But if MIN does not play optimally, MAX can do even better. Minimax Algorithm function MINIMAX-DECISION(state) returns an action inputs: state, current state in game

vMAX-VALUE(state) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v - for a,s in SUCCESSORS(state) do v MAX(v,MIN-VALUE(s)) return v function MIN-VALUE(state) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v

for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s)) return v Properties of minimax Complete? o Yes (if tree is finite) Optimal? o Yes (against an optimal opponent)

Time complexity? o O(bm) Space complexity? o O(bm) (depth-first exploration) o For chess, b 35, m 100 for "reasonable" games exact solution is infeasible Alpha-Beta Pruning Problem with minimax search: exponential in the depth of the tree

Can we cut it in half? It is possible to compute the minimax decision without looking at every node. o pruning: eliminate some parts of the tree Alpha-beta pruning We can improve on the performance of the minimax algorithm through alpha-beta pruning MAX MIN

MAX 2 7 1 ? Alpha-beta pruning We can improve on the performance of the

minimax algorithm through alpha-beta pruning MAX We dont need to compute the value at this node. MIN No matter what it is, it cant affect the value of the root node.

MAX 2 7 1 ? Alpha-Beta Example

Do DFS until the first leaf Range of possible values [-,+] [-, +] Alpha-Beta Example Do DFS until first leaf Range of possible values

[-,+] [-, +] (continued) [-,+] [-,3] (continued)

[-,+] [-,3] (continued) [-,+] [3,3] (continued) [3,+]

[3,3] (continued) [3,+] [3,3] [-, ] (continued)

[3,+] [3,3] [-,2] (continued) [3,+] [3,3]

[-,2] This node is worse for MAX (continued) [3,14] [3,3] [-,2]

, [-, ] (continued) [3,14] [3,3] [-,2]

, [-,14] (continued) [3,5] [3,3] [,2]

, [-,5] (continued) [3,3] [,2]

[2,2] (continued) [3,3] [3,3] [-,2] [2,2]

- pruning example Minimax(root) = max(min(3,12,8),min(2,x,y),min(14,5,2)) = max(3,min(2,x,y),2) =3

- pruning We made the same minimax decision without ever evaluating two of the leaf nodes! o They are independent. It is possible to prune entire subtrees. Why is it called -? = value of the best choice found so far at

any choice point along the path for max If v is worse than , max will avoid it prune that branch Define similarly for

min Alpha-Beta Algorithm function ALPHA-BETA-SEARCH(state) returns an action inputs: state, current state in game vMAX-VALUE(state, - , +) return the action in SUCCESSORS(state) with value v function MAX-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v- for a,s in SUCCESSORS(state) do

v MAX(v,MIN-VALUE(s, , )) if v then return v MAX( ,v) return v Alpha-Beta Algorithm function MIN-VALUE(state, , ) returns a utility value if TERMINAL-TEST(state) then return UTILITY(state) v+ for a,s in SUCCESSORS(state) do v MIN(v,MAX-VALUE(s, , ))

if v then return v MIN( ,v) return v Comments: Alpha-Beta Pruning Pruning does not affect the final results. Entire subtrees can be pruned. Good move ordering improves effectiveness of pruning. With perfect ordering, time

complexity is O(bm/2) o Alpha-beta pruning can look twice as far as minimax in the same amount of time Deterministic games in practice Checkers: Chinook ended 40-year-reign of human world champion Marion Tinsley in 1994. Used a precomputed endgame database defining perfect play for all positions involving 8 or fewer pieces on the board, a total of 444 billion

positions. Chess: Deep Blue defeated human world champion Garry Kasparov in a six-game match in 1997. Deep Blue searches 200 million positions per second, uses very sophisticated evaluation, and undisclosed methods for extending some lines of search up to 40 ply. Othello: Logistello defeated the human world champion. It is generally acknowledged that human are no match for computers at Othello.