程序代写案例-CSIT 5900

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CSIT 5900
Lecture 3: Agents that Search
Fangzhen Lin
Department of Computer Science and Engineering
Hong Kong University of Science and Technology
Fangzhen Lin (HKUST) Lecture 3: Search 1 / 71
Overview
Search problems.
Uninformed search.
Hueristic search.
Game tree search.
Local search and constraint satisfaction problems.
Fangzhen Lin (HKUST) Lecture 3: Search 2 / 71
Missionaries and cannibals problem
From wikipedia (https:
//en.wikipedia.org/wiki/Missionaries_and_cannibals_problem):
In the missionaries and cannibals problem, three missionaries and
three cannibals must cross a river using a boat which can carry
at most two people, under the constraint that, for both banks, if
there are missionaries present on the bank, they cannot be out-
numbered by cannibals (if they were, the cannibals would eat the
missionaries). The boat cannot cross the river by itself with no
people on board...
In the jealous husbands problem, the missionaries and cannibals
become three married couples, with the constraint that no woman
can be in the presence of another man unless her husband is also
present. Under this constraint, there cannot be both women and
men present on a bank with women outnumbering men...
Fangzhen Lin (HKUST) Lecture 3: Search 3 / 71
Search in State Spaces - The 8-Puzzle
8-puzzle:
Start State Goal State
2
45
6
7
8
1 2 3
4
6
81
23
45
6
7
8
5
8-puzzle solving agent:
Accessible environment: the agent knows exactly which state she is in.
Four possible actions: move blank left, right, up, or down.
Goal: find a sequence of actions that change the environment from
the initial to the goal state.
Fangzhen Lin (HKUST) Lecture 3: Search 4 / 71
Search Problem Definition
In general, a search problem consists of
A set of states
An initial state
A set of deterministic actions (or operators): if an action is
executable in a state, then it will produce a unique next state
A goal test
A path cost function
A solution is a sequence of actions that leads to a goal state, i.e. a state
in which the goal is satisfied.
Fangzhen Lin (HKUST) Lecture 3: Search 5 / 71
8-Puzzle - a formulation
States: any arrangements of the blank and the numbers 1 to 8 on the
board.
Initial state: any given one.
Goal test: the blank in the middle, and the numbers are in order
clockwise starting from the left top corner.
Actions: move the blank left, right, up, or down.
Path cost: the length of the path.
Fangzhen Lin (HKUST) Lecture 3: Search 6 / 71
Problem-Solving Agents
Problem-solving agents are often goal-directed.
Key ideas:
I To determine what the best action is, a problem-solving agent
systematically considers the expected outcomes of different possible
sequences of actions that lead to some goal state.
I A systematic search process is conducted in some representation space.
Steps:
1 Goal and problem formulation
2 Search process
3 Action execution
Fangzhen Lin (HKUST) Lecture 3: Search 7 / 71
Searching For Solutions
Searching for a solution to a problem can be thought of as a process of
building up a search tree:
The root of the search tree is the node corresponding to the initial
state.
At each step, the search algorithm chooses one leaf node to expand by
applying all possible actions to the state corresponding to the node.
function GENERAL-SEARCH( problem, strategy) returns a solution, or failure
initialize the search tree using the initial state of problem
loop do
if there are no candidates for expansion then return failure
choose a leaf node for expansion according to strategy
if the node contains a goal state then return the corresponding solution
else expand the node and add the resulting nodes to the search tree
end
Fangzhen Lin (HKUST) Lecture 3: Search 8 / 71
An Example Search Strategy:
Breadth-First
Expand first the root node.
At each step, expand all leaf nodes.
Some properties of the breadth-first search:
If there is a solution, then the algorithm will find it eventually.
If there are more than one solutions, then the first one returned will
be a one with the shortest length.
The complexity (both time and space complexity) of breadth-first
search is very high, making it inpractical for most real world problems.
Fangzhen Lin (HKUST) Lecture 3: Search 9 / 71
Breadth-First Search of the Eight-Puzzle
2 8
5
3
1 6 4
7
4
2 8 3
1 6 4
7 5
1
2 8
6
3
1 4
7 5
3
2 8
7
3
1 6 4
5
2
2 8
5
3
1 6
7 4
9
2 8
6
3
1 4
7 5
8
2
6
3
1 8 4
7 5
7
2 8
6
3
1 4
7 5
6
2 8
7
3
6 4
1 5
5
2 8
5
1 6 3
7 4
19
8
5
3
6
2
7
1
4
18
8
6
3
4
2
7
1 5
17
8
6
4 3
5
16
3
6
8
2
7
1
2
7
1 4
5
15
8
7
3
2 6 4
1 5
10
2 8
7
3
6 4
1 5
11
8
6
3
2 1 4
7 5
12
2 8
6
3
7 1 4
5
13
2
6
3
1 8 4
7 5
14
2
5
8
1 6 3
7 4
2 8 3
1 5 6
7 4
2
5
3
1 8 6
7 4
2 8
5
3
1 6
7 4
2 8 3
1 4 5
7 6
2
6
8
1 4 3
7 5
2 3
6
4
1 8
7 5
8
7
3
2 6 4
1 5
20
2
7
3
6 8 4
1 5
21
2 8
7
3
6 4
1 5
22
2 8 3
6 7 4
1 5
23
8
6
3
2 1 4
7 5
24
2 8 3
7 1 4
6 5
25
1 2
6
3
8 4
7 5
26
27
8 3
7
2 6 4
1 5
8 6
7
3
2 4
1 5
2
7
3
6 8 4
1 5
2 3
7
6 8 4
1 5
2 8
7
6 4 3
1 5
2 8
7
3
6 4 5
1
2 8
1
3
6 7 4
5
2 8
5
3
6 7 4
1
8 3
6
2 1 4
7 5
8 1
6
3
2 4
7 5
2 8
1
3
7 4
6 5
2 8
5
3
7 1 4
6
1 2
6
3
8 4
7 5
1 2
6
3
7 8 4
5
Goal
nodeStart
node
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 10 / 71
Search Strategies
The search strategy determines the order in which nodes in the search
tree are expanded.
Different search strategies lead to different search trees.
Four different criteria that determine what the right search strategy is
for a problem:
I Completeness:
Is it guaranteed to find a solution if one exists?
I Time complexity:
How long does it take to find a solution?
I Space complexity:
How much memory is needed?
I Optimality:
Does it find the “best” solution if several different solutions exist?
Types:
I Uninformed (or blind) search strategies
I Informed (or heuristic) search strategies
Fangzhen Lin (HKUST) Lecture 3: Search 11 / 71
Depth-First (Backtracking) Search
Depth-first search generates the successors of a node one at a time;
as soon as a successor is generated, one of its successors is generated;
Normally, a depth bound is used: no successor is generated whose
depth is greater than the depth bound;
the following is a depth-first search illustration using 8 puzzle with a
depth bound of 5.
2 8 3
1 6 4
7 50
2 8 3
1 6 4
7 51
2 8 3
6 4
1 7 52
8 3
2 6 4
1 7 53
8 3
2 6 4
1 7 54
8 3
2 6 4
1 7 55
(a)
2 8 3
1 6 4
7 50
2 8 3
1 6 4
7 51
2 8 3
6 4
1 7 52
8 3
2 6 4
1 7 5 1 7 53
8 32
6 4
7
8 3
2 6 4
1 7 54
8 3
2
6
4
1 7 56
(b)
2 8 3
1 6 4
7 50
2 8 3
1 6 4
7 51
2 8 3
6 4
1 7 52
8 32
6 4
1 7 57
8
32
6 4
1 7 58
8
32
6 4
1 7 59
(c)
Discarded before
generating node 7
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 12 / 71
Comparing Breadth-First and Depth-First Search
In the following, b is the branching factor; d is the depth of solution; m is
the depth bound of the depth first search:
Time Space Optimal? Complete?
Breadth First bd bd Yes Yes
Depth First bm bm No Yes, if m ≥ d
Fangzhen Lin (HKUST) Lecture 3: Search 13 / 71
Iterative Deepening Search
A technique called iterative deepening enjoys the linear memory
requirements of depth-first search while guaranteeing optimality if a
goal can be found at all;
it conducts successive depth-first search, by increasing the depth
bound by 1 each time, until a goal is found:
Depth bound = 1 Depth bound = 2 Depth bound = 3 Depth bound = 4
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 14 / 71
Avoiding Repeated States
Newly expanded nodes may contain states that have already
encountered before.
There are three ways to deal with repeated states, in increasing order
of effectiveness and computational overhead:
I Do not return to the state just came from.
I Do not create a path with cycles, that is, do not expand a node whose
state has appeared in the path already.
I Do not generate any state that was generated before.
A state space that generates an exponentially large search tree:
A
B
C
D
A
BB
CCCC
Fangzhen Lin (HKUST) Lecture 3: Search 15 / 71
Heuristic Search
A heuristic function is a mapping from states to numbers.
It normaly measures how far the current state is from a goal state, so
the smaller the value is the closer it is from the goal.
Heuristic or best-first search starts with a heuristic function and
chooses nodes that have smallest value to expand.
Fangzhen Lin (HKUST) Lecture 3: Search 16 / 71
Eight Puzzle
Consider the following heuristic function for the eight-puzzle:
f(n) = number of tiles out of place compared with goal
Here is a search using this function (the number next to the node is its
value of this function):
2 8 3
1 6 4
7 54
2 8 3
1
6
4
7 53
2
8
3
6
41
7 53
2 8 3
1 6 4
7 55
2 8 3
6
41
7 54
2 8 3
1 6 4
7 55
2 8 3
6
41
7 53
8 3
2 1 4
7 6 5
8 32
6 5
7 1 4
3 4
8 3
2 1 4
7 6 53
To the goal
To more fruitless wandering
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 17 / 71
Eight Puzzle
The following is the search using the same heuristic function but with path
cost added:
2 8 3
1 6 4
7 50 + 4
2 8 3
1
6
4
7 51 + 3
2
8
3
6
41
7 52 + 3
2 8 3
1 6 4
7 51 + 5
2 8 3
6
41
7 52 + 4
2
8
3
6
41
7 53 + 2
2
8
3
6
41
7 53 + 4
2
8
3
6
4
1
7 54 + 1
2 8 3
1 6 4
7 51 + 5
2 8 3
6
41
7 52 + 3
2 8 3
6
417
53 + 4
2
8 3
6
41
7 53 + 3
1
8
3
6
47
2
55 + 2
1 2 3
6
48
7 55 + 0
Goal
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 18 / 71
A∗ Search
Evaluation function:
I Sum of:
F actual path cost g(n) from the start node to node n
F estimated cost h(n)
I Estimated cost of the cheapest path through node n
Idea: Try to find the cheapest solution.
Fangzhen Lin (HKUST) Lecture 3: Search 19 / 71
A* Search by Tree
A∗ search by trees - check only ancesters for repeated states
1 create a search tree T , consisting solely of the start node, n0. Put n0
on a list called OPEN.
2 If OPEN is empty, then exit with failure.
3 Select the first node on OPEN, and remove it from OPEN. Call this
node n.
4 If n is a goal node, exit successfully with the solution corresponding
to the path from root n0 to this node.
5 Expand node n, generating the set M of its successors that are not
already ancestors of n in G . Install these members of M as children of
n in G , and add them to OPEN.
6 reorder the list OPEN in order of increasing g(n) + h(n) values. (Ties
are resolved in favor of the deepest node in the search tree.)
7 go to step 2.
Fangzhen Lin (HKUST) Lecture 3: Search 20 / 71
Route Finding
Bucharest
Giurgiu
Urziceni
Hirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
Dobreta
Craiova
Sibiu
Fagaras
Pitesti
Rimnicu Vilcea
Vaslui
Iasi
Straight−line distance
to Bucharest
0
160
242
161
77
151
241
366
193
178
253
329
80
199
244
380
226
234
374
98
Giurgiu
Urziceni
Hirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
Dobreta
Craiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75
120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
Fangzhen Lin (HKUST) Lecture 3: Search 21 / 71
A∗ Search for Route Finding
Arad
Arad
Timisoara ZerindSibiu
Sibiu
Arad OradeaFagaras Rimnicu
Arad
Timisoara Zerind
Sibiu
Arad OradeaFagaras Rimnicu
Arad
Timisoara Zerind
Craiova Pitesti Sibiu
f=0+366
=366
f=140+253
=393
f=118+329
=447
f=75+374
=449
f=280+366
=646
f=239+178
=417
f=146+380
=526
f=118+329
=447
f=220+193
=413
f=75+374
=449
f=280+366
=646
f=239+178
=417
f=146+380
=526
f=118+329
=447
f=75+374
=449
f=300+253
=553
f=317+98
=415
f=366+160
=526
Fangzhen Lin (HKUST) Lecture 3: Search 22 / 71
Behavior of A∗
We shall assume:
h is admissible, that is, it is never larger than the actual cost.
g is the sum of the cost of the operators along the path.
The cost of each operator is greater than some positive amount, .
The number of operators is finite (thus finite branching factor of
search tree).
Under these conditions, we can always revise h into another admissible
heuristic funtion so that the f = h + g values along any path in the search
tree is never decreasing. (Monotonicity.) If f ∗ is the cost of an optimal
solution, then
Fangzhen Lin (HKUST) Lecture 3: Search 23 / 71
A∗ expands all nodes with f (n) < f ∗.
A∗ may expands some nodes with f (n) = f ∗ before finding goal.
O
Z
A
T
L
M
D
C
R
F
P
G
B
U
H
E
V
I
N
380
400
420
S
Fangzhen Lin (HKUST) Lecture 3: Search 24 / 71
Completeness and Optimality of A∗
Under these assumptions,
A∗ is complete: A∗ expands nodes in order of increasing f , there are
only finite number of nodes with f (n) ≤ f ∗ is finite, so it must
eventually expanded to reach a goal state.
A∗ is optimal: it will always return an optimal goal. Proof:
G -optimal, G2-suboptimal.
It is impossible for A∗ to find G2 before G :
G
n
G2
Start
It is optimally efficient: no other optimal algorithm is guaranteed to
expand fewer nodes than A∗ (Dechter and Pearl 1985).
Fangzhen Lin (HKUST) Lecture 3: Search 25 / 71
Complexity of A∗
Complexity:
I Number of nodes expanded is exponential in the length of the solution.
I All generated nodes are kept in memory. (A∗ usually runs out of space
long before it runs out of time.)
I With a good heuristic, significant savings are still possible compared to
uninformed search methods.
I Admissible heuristic functions that give higher values tend to make the
search more efficient.
Memory-bounded extensions to A∗:
I Iterative deepening A∗ (IDA∗)
I Simplified memory-bounded A∗ (SMA∗)
Fangzhen Lin (HKUST) Lecture 3: Search 26 / 71
Heuristic Functions for 8-Puzzle
Some possible candidates (admissible heuristic functions):
I Number of tiles that are in the wrong position (h1)
I Sum of the city block distances of the tiles from their goal positions
(h2)
Comparison of iterative deepening search, A∗ with h1, and A∗ with h2
(avegared over 100 instances of the 8-puzzle, for various solution
lengths d):
Search Cost Effective Branching Factor
d IDS A*(h1) A*(h2) IDS A*(h1) A*(h2)
2 10 6 6 2.45 1.79 1.79
4 112 13 12 2.87 1.48 1.45
6 680 20 18 2.73 1.34 1.30
8 6384 39 25 2.80 1.33 1.24
10 47127 93 39 2.79 1.38 1.22
12 364404 227 73 2.78 1.42 1.24
14 3473941 539 113 2.83 1.44 1.23
16 – 1301 211 – 1.45 1.25
18 – 3056 363 – 1.46 1.26
20 – 7276 676 – 1.47 1.27
22 – 18094 1219 – 1.48 1.28
24 – 39135 1641 – 1.48 1.26
We see that h2 is better than h1.
In general, it is always better to use a heuristic function with higher
values, as long as it is admissible, i.e. does not overestimate: h2
dominates (is better than) h1 because for all node n, h2(n) ≥ h1(n).
Fangzhen Lin (HKUST) Lecture 3: Search 27 / 71
Inventing Heuristic Functions
Questions:
How might one have come up with good heuristics such as h2 for the
8-puzzle?
Is it possible for a computer to mechanically invent such heuristics?
Relaxed Problem. Given a problem, a relaxed problem is one with less
restrictions on the operators.
Strategy. use the path cost of a relaxed problem as the heuristic function
for the original problem.
Fangzhen Lin (HKUST) Lecture 3: Search 28 / 71
Inventing Heuristic Function - 8-Puzzle
Example. Suppose the 8-puzzle operators are described as:
A tile can move from square A to B if A is adjacent to B and B
is blank.
We can then generate three relaxed problems by removing some of the
conditions:
(a) A tile can move from square A to B if A is adjacent to B.
(b) A tile can move from square A to B if B is blank.
(c) A tile can move from square A to B.
Using (a), we get h2. Using (c), we get h1.
ABSOLVER (Prieditis 1993) is a computer program that automatically
generates heuristics based on this and other techniques.
Fangzhen Lin (HKUST) Lecture 3: Search 29 / 71
Games - An Example
Tic-Tac-Toe:
A board with nine squares.
Two players: “X” and “O”; “X” moves first, and then alternate.
At each step, the player choose an unoccupied square, and mark it
with his/her name. Whoever gets three in a line wins.
Fangzhen Lin (HKUST) Lecture 3: Search 30 / 71
Games as Search Problems
Mainly two-player games are considered.
Two-player game-playing problems require an agent to plan ahead in
an environment in which another agent acts as its opponent.
Two sources of uncertainty:
I Opponent: The agent does not know in advance what action its
opponent will take.
I Time-bounded search: Time limit has to be imposed because searching
for optimal decisions in large state spaces may not be practical. The
“best” decisions found may be suboptimal.
Search problem:
I Initial state: initial board configuration and indication of who makes
the first move.
I Operators: legal moves.
I Terminal test: determines when a terminal state is reached.
I Utility function (payoff function): returns a numeric score to quantify
the outcome of a game.
Fangzhen Lin (HKUST) Lecture 3: Search 31 / 71
Game Tree for Tic-Tac-Toe
XX
XX
X
X
X
XX
MAX (X)
MIN (O)
X X
O
O
OX O
O
O O
O OO
MAX (X)
X OX OX O X
X X
X
X
X X
MIN (O)
X O X X O X X O X
. . . . . . . . . . . .
. . .
. . .
. . .
TERMINAL
XX
−1 0 +1Utility
Fangzhen Lin (HKUST) Lecture 3: Search 32 / 71
Minimax Algorithm With Perfect Decisions
Minimax Algorithm (With Perfect Decisions) Assume the two players are:
MAX (self) and MIN (opponent). To evaluate a node n in a game tree:
1 Expand the entire tree below n.
2 Evaluate the terminal nodes using the given utility function.
3 Select a node that has not been evaluated yet, and all of its children
have been evaulated. If there are no such node, then return.
4 If the selected node is on at which the MIN moves, assign it the
minimum of the values of its children. If the selected node is on at
which the MAX moves, assign it the maximum of the values of its
children. Return to step 3.
Fangzhen Lin (HKUST) Lecture 3: Search 33 / 71
MAX tries to maximize the utility, assuming that MIN will act to minimize
it.
MAX
3 12 8 642 14 5 2
MIN
3
A 1 A 3A 2
A 13A 12A 11 A 21 A 23A 22 A 33A 32A 31
3 2 2
Fangzhen Lin (HKUST) Lecture 3: Search 34 / 71
Imperfect Decisions
Minimax algorithm with perfect decisions:
I No time limit is imposed.
I The complete search tree is generated.
It is often impractical to make perfect decisions, as the time and/or
space requirements for complete game tree search (to terminal states)
are intractable.
Two modifications to minimax algorithm with perfect decisions:
I Partial tree search:
Terminal test is replaced by a cutoff test.
I Evaluation function:
Utility function is replaced by a heuristic evaluation function.
Fangzhen Lin (HKUST) Lecture 3: Search 35 / 71
Minimax Algorithm (With Imperfect Decisions) Assume the two players
are: MAX (self) and MIN (opponent). To evaluate a node n in a game
tree:
1 Expand the tree below n according to the partial tree search.
2 Evaluate the leaf nodes using the given evaluation function.
3 Select a node that has not been evaluated yet, and all of its children
have been evaulated. If there are no such node, then return.
4 If the selected node is on at which the MIN moves, assign it the
minimum of the values of its children. If the selected node is on at
which the MAX moves, assign it the maximum of the values of its
children. Return to step 3.
Fangzhen Lin (HKUST) Lecture 3: Search 36 / 71
Evaluation Functions
An evaluation function returns an estimate of the expected utility of
the game from a given position.
Requirements:
I Computation of evaluation function values is efficient.
I Evaluation function agrees with utility function on terminal states.
I Evaluation function accurately reflects the chances of winning.
Most game-playing programs use a weighted linear function:
w1f1 + w2f2 + · · ·wnfn
where the f ’s are the features (e.g. number of queens in chess) of the
game position, and w ’s are the weights that measure the importance
of the corresponding features.
Learning good evaluation functions automatically from past
experience is a promising new direction.
Fangzhen Lin (HKUST) Lecture 3: Search 37 / 71
Tic-Tac-Toe
Evauation function, e(p), for Tic-Tac-Toe
If p is not a winning position for either player, then e(p) = (the
number of complete rows, columns, or diagonals that are still open for
MAX) - (the number of complete rows, columns, or diagonals that
are still open for MIN).
If p is a win for MAX, then e(p) =∞; if p is a win for MIN, then
e(p) = −∞.
The next three slides illustrate Minimax search with imperfect decisions.
Notice that we have eliminated symmetric positions.
Fangzhen Lin (HKUST) Lecture 3: Search 38 / 71
The First Stage of Search
1
Start node
X
–2
O
X
5 – 6 = –1
O
X
5 – 5 = 0
O X
5 – 6 = –1
O
X
6 – 6 = 0
O
X
4 – 6 = –2
X
–1
O
X
6 – 5 = 1
X O
5 – 5 = 0
X O
6 – 5 = 1
O
X
5 – 5 = 0
O
X
4 – 5 = –1
X
1
X
O
5 – 4 = 1
O X 6 – 4 = 2
MAX’s move
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 39 / 71
The Second Stage of Search
O X
X
1
O X
X
0
O X
X
1
O X X
0
O X X
O
3 – 3 = 0
O X X
O
4 – 3 = 1
O X X
O
4 – 3 = 1
O
O X
X
4 – 2 = 2
O
O X
X
4 – 2 = 2
O
O X
X
3 – 2 = 1
O X O
X
5 – 2 = 3
O X
O X
4 – 2 = 2
O X
O X
4 – 2 = 2
O
O X
X
4 – 3 = 1
O
O X
X
4 – 3 = 1
O
O X
X
3 – 3 = 0
O X O
X
5 – 3 = 2
O X
X O
3 – 3 = 0
O X
O X
4 – 3 = 1
O
O X
X
O X
1
3 – 2 = 1
O
O X
X
4 – 2 = 2
O
O X
X
3 – 2 = 1
O X O
X
5 – 2 = 3
O X
X O
3 – 2 = 1
O X
X O
4 – 2 = 2
MAX’s move
Start node
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 40 / 71
The Last Stage of Search
O
O X X
X
–∞1
O O
O X X
X
3 – 2 = 1
O O
O X X
X
2 – 2 = 0
O
O X X
X
2 – 2 = 0
O
O X X
O X
– ∞
O X
O X
X
–∞
O O X
O X
X
3 – 1 = 2
O X
O X O
X
3 – 1 = 2
O X
O X
O X
2 – 1 = 1
O X
O X
O X
– ∞
O
O X
X X
–∞ O O
O X
X X
3 – 2 = 1D
C
B
A
O O
O X
X X
2 – 2 = 0
O
O X O
X X
3 – 2 = 1
O
O X
O X X
–∞
O X
O X
X
–∞
O X O
O X
X
2 – 1 = 1
O X
O X O
X
2 – 1 = 1
O X
O X
O X
2 – 1 = 1
O X
O X
O X
–∞
O
O X
X X
1 O O
O X
X X
3 – 1 = 2
O O
O X
X X
2 – 1 = 1
O
O X O
X X
3 – 1 = 2
O
O X
X O X
2 – 1 = 1
O
O X
X
MAX’s
move
Start
node
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 41 / 71
Partial Tree Search
Different approaches:
I Depth-limited search
I Iterative deepening search
I Quiescent search
Quiescent positions are positions that are not likely to have large
variations in evaluation in the near future.
Quiescent search:
I Expansion of nonquiescent positions until quiescent positions are
reached.
Fangzhen Lin (HKUST) Lecture 3: Search 42 / 71
Pruning
Pruning: The process of eliminating a branch of the search tree from
consideration without actually examining it.
General idea: If m is better than n for Player, then n will never be
reached in actual play and hence can be pruned away.
Player
Opponent
Player
Opponent
..
..
..
m
n
Fangzhen Lin (HKUST) Lecture 3: Search 43 / 71
Alpha-Beta Pruning
Minimax search without pruning:
MAX
3 12 8 642 14 5 2
MIN
3
A 1 A 3A 2
A 13A 12A 11 A 21 A 23A 22 A 33A 32A 31
3 2 2
Minimax search with alpha-beta pruning:
MAX
3 12 8 2 14 5 2
MIN
3
A 1 A 3A 2
A 13A 12A 11 A 21 A 23A 22 A 33A 32A 31
3 2<=2
Effectiveness of alpha-beta pruning depends on the order in which
successor nodes are examined.
Fangzhen Lin (HKUST) Lecture 3: Search 44 / 71
Alpha-Beta Search Algorithm - Informal Description
To evaluate a MAX node in a game tree (we shall call the value assigned
to a MAX node its alpha value, and that to a MIN node its beta value):
1 Expand the node depth-first until a node that satisfies the cutoff test
is reached.
2 Evaluate the cutoff node.
3 Update the values of all the nodes that have so far been expanded
according to Minimax algorithm, and using the following pruning
strategy:
I prune all children of any MIN node whose beta value is ≤ the alpha
value of any of its MAX ancestors.
I prune all children of any MAX node whose alpha value is ≥ the beta
value of any of its MIN ancestors.
4 Backtrack to a node that has not been pruned, and go back to step
1. If there are no such node to backtrack to, then return with the
value assigned to the original node.
Fangzhen Lin (HKUST) Lecture 3: Search 45 / 71
Example
Part of the first state of search in Tic-Tac-Toe using Alpha-Beta search.
X
O X
5 – 6 = –1
X
–1 O
X
5 – 5 = 0
O
X
4 – 5 = –1
O
X
6 – 5 = 1
X O
5 – 5 = 0
X O
6 – 5 = 1
Start node
A
B C
Alpha value = –1
Beta value = –1
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 46 / 71
Alpha-Beta Search Algorithm
The above informal description corresponds to calling the following
function
MAX-VALUE(node,game,-∞,∞):
function MAX-VALUE(state, game,,) returns the minimax value of state
inputs: state, current state in game
game, game description
, the best score for MAX along the path to state
, the best score for MIN along the path to state
if CUTOFF-TEST(state) then return EVAL(state)
for each s in SUCCESSORS(state) do
MAX(, MIN-VALUE(s, game,,))
if then return
end
return
function MIN-VALUE(state, game,,) returns the minimax value of state
if CUTOFF-TEST(state) then return EVAL(state)
for each s in SUCCESSORS(state) do
MIN( , MAX-VALUE(s, game,,))
if then return
end
return
Fangzhen Lin (HKUST) Lecture 3: Search 47 / 71
Analyses of Game-Playing Algorithm
The effectiveness of alpha-beta pruning depends on the ordering in
which successors are generated.
Assuming optimal ordering, alpha-beta pruning can reduce the
branching factor from b to
√
b (Knuth and Moore 1975).
The result also assumes that all nodes have the same branching
factor, that all paths reach the same fixed depth limit, and that the
leaf (cutoff node) evaluations are randomly distributed.
All the game-playing algorithms assume that the opponent plays
optimally.
Fangzhen Lin (HKUST) Lecture 3: Search 48 / 71
Monte-Carlo Search
Pure Monte-Carlo Search:
Expand the current node;
For each child, play a large number of random games to the finish
and compute the average payoffs (values).
Play the child move that has the largest average value.
Fangzhen Lin (HKUST) Lecture 3: Search 49 / 71
Monte-Carlo Tree Search
Making use of MCS in game playing:
Use formulas other than average to summarize the sampling results.
Combine it with alpha-beta pruning search: start with alpha-beta
pruning and go to Monte-Carlo when the level is too deep to continue
or when there is no good evaluation function on the states.
Make use of domain knowledge, and combine it with learning
techniques.
Fangzhen Lin (HKUST) Lecture 3: Search 50 / 71
An Example
Averages may be misleading: minimax would choose the left branch while
averaging would lead to the right branch.
Fangzhen Lin (HKUST) Lecture 3: Search 51 / 71
Monte-Carlo Search with UCB
MCS with UCB (Upper Confidence Bound):
Expand the current node v ;
For each child Mi , play a large number of random games to the finish
and compute its UCB. A common formula is:
UCB(Mi ) = µi + c
√
logN
Ni
where
I µi is the expected value of the games (tryouts) for the child node Mi .
For example, in zero-sum game, it’s WiNi , where Wi is the number of
win’s for Mi , and Ni is the total number of plays for Mi ;
I N is the total number of plays for v ;
I c is a constant called the exploration parameter used to balance
branches.
Play the child move that has the largest average value.
Fangzhen Lin (HKUST) Lecture 3: Search 52 / 71
Monte-Carlo Tree Search (MCTS)
MCTS:
Start building the tree using alpha-beta pruning.
Continue on with Monte-Carlo search with UCB scoring rule.
Propagate the values up the alpha-beta tree.
Fangzhen Lin (HKUST) Lecture 3: Search 53 / 71
State of Art
Chess: Deep Blue (Benjamin, Tan, et al.) defeated world chess
champion Gary Kasparov on May 11, 1997.
Checker: Chinook (Schaeffer et al.) became the world champion in
1994.
Go: AlphaGo (Silver et al.) defeated a top-level player in 2016.
Fangzhen Lin (HKUST) Lecture 3: Search 54 / 71
Alternative Search Formulations
Assignment problem or constraint satisfaction.
Function optimization.
Fangzhen Lin (HKUST) Lecture 3: Search 55 / 71
Assignment Problems
Problem definition:
A finite set of variables and their domains.
A finite set of conditions on these variables.
A solution is an assignment to these variables that satisfies all the
conditions.
Example: 8-queens problem:
Variables: q1, ..., q8, the position of a queen in column 1,...,8.
Domain: the same for all variables, {1, .., 8}.
Constraints: q1 − q2 6= 0, |q1 − q2| 6= 1,...
Fangzhen Lin (HKUST) Lecture 3: Search 56 / 71
Constructive Methods
Constructive methods use search strategies, especially depth-first search,
to try to find a solution:
States: any partial assignment (assign values to some of the
variables).
Initial state: the empty assignment.
Operator: pick a new variable, and assign a value in its domain to it.
Path cost: 0.
Goal condition: when all constraints are satisfied.
Fangzhen Lin (HKUST) Lecture 3: Search 57 / 71
Constraint Propagation
A technique called constraint propagation can be used to prune the search
space:
If there is a constraint involving both variables i and j , then know-
ing the values of i may eliminate possible values for j .
A constraint graph is often used to do constraint propagation: a node is
labelled by a variable and the domain that the variable can take, and there
is an edge from node i to j if the variable i and j are constrainted.
Fangzhen Lin (HKUST) Lecture 3: Search 58 / 71
Running Example: 4-Queens Problem
Constraint graph:
{1, 2, 3, 4}
{1, 2, 3, 4} {1, 2, 3, 4}
{1, 2, 3, 4}
q1
q4
q2 q3
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 59 / 71
Running Example: 4-Queens Problem
Constraint graph with q1 = 1:
{1}
{ 3 , 4} { 2 , 4 }
{2, 3}
q1
q4
q2 q3
Values eliminated by first making arc (q2, q3)
and then arc (q3, q2) consistent
Value eliminated by next making arc (q3, q4) consistent
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 60 / 71
Running Example: 4-Queens Problem
Constraint graph with q = 2:
{2}
{4} {1, 3 }
{ 1 , 3, 4 }
q1
q4
q2 q3
Value eliminated by first making arc (q3, q2) consistent
Value eliminated by next making arc (q4, q3) consistent
Value eliminated by next making arc (q4, q2) consistent
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 61 / 71
Heuristic Repair
Constructive methods starts with empty assignment, and build up a
solution gradually by assigning values to variables one by one.
Heuristic repair starts with a proposed solution, which most probably
does not satisfy all constraints,and repair it until it does.
An often used, so-called min-conflicts repair method by Gu will select
a variable for adjust, and find a value that minimizes conflicts.
Fangzhen Lin (HKUST) Lecture 3: Search 62 / 71
8-Queens By Min-Conflicts
x2
1
3
2
4
1
0
2
x
x
x
x
x
x
x
x3
2
3
2
1
1
2
2
x
x
x
x
x
x
x
x1
2
1
2
3
2
1
2
x
x
x
x
x
x
x
x1
4
1
2
2
3
2
2
x
x
x
x
x
x
x
Move
(1,3) to (1,2)
Move (2,6)
to (2,7)
No change
© 1998 Morgan Kaufman Publishers
Fangzhen Lin (HKUST) Lecture 3: Search 63 / 71
Satisfiability
A clause is a disjunction of literals - a literal is a variable or the
negation of a variable.
Satisfiability (SAT) refers to the problem of deciding if a set of
clauses is satisfiable. It was the first NP-complete problem discovered
by Cook, it is also one of the most intensely studied NP-complete
problem.
There is a huge literature on SAT. SAT algorithms have many
applications.
3SAT refers to the problem of deciding if a set of clauses that have
no more than 3 literals is satisfiable.
In terms of computational complexity, SAT is equivalent to 3SAT.
A procedure for SAT is sound if whenever it returns yes, the input is
indeed satisfiable; it is complete if it will return yes whenever the
input is satisfiable. Obviously, we want only sound procedures. But
incomplete ones can sometimes be very useful.
Fangzhen Lin (HKUST) Lecture 3: Search 64 / 71
DPLL Procedure
The most popular and widely studied compete method for SAT is the so
called Davis-Putnam-Longemann-Loveland (DPLL) procedure.
Let α be a set of clauses. If l is a literal occurring in α, then by α(l), we
mean the result of letting l true and simplifying: if a clause C does not
mention l and ¬l , then do nothing; if C mention l , then eliminate it; if C
mention ¬l , then delete ¬l from C .
Procedure DPLL(CNF: α)
if α is empty, then return yes.
else if there is an empty clause in α, return no.
else if there is a pure literal l in α, then return DPLL(α(l)).
else if there is a unit clause {l} in α, then return DPLL(α(l)).
else select a variable p in α, and do the following
if DPLL(α(p)) return yes, then return yes.
else return DPLL(α(¬p)).
Where a literal in α is pure if it occurrs only positively or negatively, and a
unit clause is one whose length is 1.
Fangzhen Lin (HKUST) Lecture 3: Search 65 / 71
Modern DPLL
The state-of-art SAT solvers use DPLL with conflict driven clause learning
and dependency-based backtracking:
1 Start with the empty assignment.
2 If all variables are assigned then return with the assignment.
3 Choose an assignment for an un-assigned variable (backtrack point).
4 Propagate (deduce) the extended partial assignment.
5 If propagate yields a contradiction:
I analyze the conflict to learn some new clauses
(conflict driven clause learning)
I backtrack to the nearest backtrack point that is relevant to the conflict.
I exit with failure (unsatisfiable) if backtrack returns no new alternatives.
6 Else go back to step 2.
See https:
//en.wikipedia.org/wiki/Conflict-Driven_Clause_Learning for
more details.
Fangzhen Lin (HKUST) Lecture 3: Search 66 / 71
GSAT
Incomplete methods cannot be used to check unsatisfiable CNFs. But on
satisfiable CNFs, incomplete methods can sometimes be much faster than
the currently known best complete methods.
The following randomized local search algorithm, often called GSAT, is
typical of current best known incomplete methods:
Procedure GSAT(CNF: α, max-restart: mr, max-climb: mc)
for i = 1 to mr do
get a randomly generated truth assignment v .
for j = 1 to mc do
if v satisfies α, return yes
else let v be a random choice of one the best successors of v
return failure
where a successor of v is an assignment with the truth value of one of the
variables flipped, and a best successor is one that satisfies maximum
number of clauses in α.
Fangzhen Lin (HKUST) Lecture 3: Search 67 / 71
Search as Function Maximization
Funciton maximization problems: find a x such that Value(x) is
maximal.
Examples: 8-queens problem
I find a state in which none of the 8 queens are attacked.
I design a function on the states such that it’s value is maximal when
none of the queens are attacked.
I one such function can be: Value(n) = 1/ek(n), where k(n) is the
number of queens that are attacked.
Another example: VLSI design - find a layout that satisfies the
constraints.
Fangzhen Lin (HKUST) Lecture 3: Search 68 / 71
Hill-Climbing
Basic ideas:
Start at the initial state.
At each step, move to a next state with highest value.
It does not maintain the search tree and does not backtrack - it keeps
only the current node and its evaluation.
evaluation
current
state
Fangzhen Lin (HKUST) Lecture 3: Search 69 / 71
Hill-Climbing Search Algorithm
function HILL-CLIMBING( problem) returns a solution state
inputs: problem, a problem
static: current, a node
next, a node
currentMAKE-NODE(INITIAL-STATE[problem])
loop do
next a highest-valued successor of current
if VALUE[next] < VALUE[current] then return current
current next
end
Fangzhen Lin (HKUST) Lecture 3: Search 70 / 71
Simulated Annealing
Hill-climbing can get easilty stuck in a local maximum.
One way to aovid getting stuck in a local maximum is by using
simulated annealing.
The main idea:
I at each step, instead of picking the best move, it picks a random move.
I if the move leads to a state with higher value, then execute the move.
I otherwise, execute the move with certain probability that becomes
smaller as the algorithm progresses.
I the probability is computed according to a function (schedule) that
maps time to probabilities.
The idea comes from the process of annealing - cooling metal liquid
gradually until it freezes.
It has been found to be extremely effective in real applications such
as factory scheduling.
Fangzhen Lin (HKUST) Lecture 3: Search 71 / 71

欢迎咨询51作业君