Keywords: BFS, DFS, UCS, A* search, tree search, graph search, heuristic function
Search and dynamic programming are two fundamental methodologies with a strong connection. (Recall Feynman-Kac and Fokker-Planck equation)
1. Definition
1.1 State space and search problem
A formal formulation of a search problem requires four elements:
a) A state space: the set of all possible states
b) A successor function: $\text{state}_t \times \text{action} \xrightarrow{} \text{state}_{t + 1} \times \text{cost}$
c) Start state
d) Goal state
A path from start state to goal state is called a plan, while the order in which states are considered is determined by a strategy.
1.2 World state and search state
These are two important concepts. The world state concludes information like the number of rows and columns in a maze, while the search state conclude the information which is essential in the planning and successor function, i.e the information that may change given an action.
Remark: 强化学习的弹球游戏的讨论中实际上涉及到了这个问题,我们通常上所描述的状态是指search state,而world state在强化学习中是需要我们建模的部分,或者说决定了value function 或者 Q 函数的性质。在SC问题中,world state又体现为HJB equation的边界条件。
1.3 The element of search (finding a plan)
1) The fringe of partial plans derived from the tree. (fringe: 穗,即稻、麦等谷物成熟后聚生在茎干顶端的花或果实。这理解为一次扫描的结果,或子搜索问题,或类似子节点的概念,从初始点开始的搜索问题可以转变为与所有与初始点连接的点为初始点的搜索问题)
2) Expansion of the fringe.
3) Strategy of expansion
1.4 The evaluation of the search strategy
Completeness: The strategy will find a solution if there does exist one.
Optimality: Is the strategy guaranteed to find the lowest cost path to a goal state. (Path is the action set from the start state to goal state)
Remark: Optimality means if we can find the path with lowest cost, rather than the number of expansion or time we spend on the search process.
2. A review of uninformed search
Branching factor $b$: the increase in the number of nodes on the fringe each time a fringe node is dequeued and replaced with its children is $O(b)$
2.1 DFS
Due to the properties of graph, the DFS is not complete due to the properties of the fringe representation . (因为我们总是弹出最后一个进来的元素,所以栈中无法保存所有已经遍历过的元素,如果存在环,即节点连入节点本身,我们在栈中留不下这个节点, 我们甚至没法发现这个成环的问题,因为环中的前一个节点总是被弹出). DFS is not optimal as well since we cannot evaluate the cost. We just expand given the ‘leftmost’ strategy. The time complexity is $O(b^m)$ in the worst scenario if the length of tree has m layers. (一下有b个节点,b个节点每个节点下又有b个,共m层,全需要遍历). The space complexity is $O(mb)$ (according to fringe representation, we at most keep one branch of each layer)
2.2 BFS
We want to traverse the graph or tree layer to layer. Therefore, we need to output the oldest enqueued object to represent the fringe. BFS is complete due to the fringe representation (BFS 是能够发现环的,因为每次进队列的时候,本次fringe的上一个元素还留着). Let $s$ denote the shallowest solution. Apparently, the time complexity is $O(b^s)$ ($1 + b + b^2 + \dots + b^s$). The space complexity is $O(b^s)$. The BFS is optimal if the cost of all the actions is a constant. For example, the cost of a path is defined as the length of the actions set.
2.3 UCS (Uniform Cost Search)
The strategy of expansion is to choose the lowest cost fringe node. Therefore, we always use a heap-based priority queue to represent the fringe, where the weight is the path cost from the start node to the current state, or the backward cost. The properties of this fringe representation is that the nodes with lower backward cost will never be visited later than the ones with higher backward cost if the cost of each edge is nonnegative (反证易得,这个二级结论后面还要用). USC is complete and optimal is the cost of each edge is nonnegative (Recall Dijkstra algo). Let C denote the optimal cost while $\epsilon$ denote the minimal cost. We must explore every nodes at depths from 1 to $\frac{C}{\epsilon}$, i.e. $O(b^{\frac{C}{\epsilon}})$. The same as the space complexity. BFS is a special version of USC.
3. Informed search
Although we can find the optimal path with USC, it can be fairly slow since we need to expands in every direction. We can fasten the searching process if we can know the effect of our action. We want to go through the nodes with probably lower total cost first. The searching direction is the focus of informed search.
3.1 Heuristic function
$state \times problem \xrightarrow{} cost$. This function only depends on the world state, i.e. the problem here.
3.2 A* search
Identical to USC (Priority queue with estimated cost as the weights). Given a good enough heuristic function, A* search is both complete and optimal. Let $n$ denote the current state, $g(n)$ denote the backward cost, i.e. from the start state to the current state, $h(n)$ denote the value of the heuristic function, $f(n)$ denote the estimated total cost, i.e. $g(n) + h(n)$.
3.2.1 A* tree search
The condition required for optimality when using A* tree search (the construction of a tree requires the children have at most one parent node, i.e. no cycle) is known as admissibility. Define $h^*(n)$ is the true optimal forward cost to reach a goal state form a given node n. The admissibility constraint of a heuristic function is defined as:
$\forall n, 0 \le h(n) \le h^*(n)$.
Theorem If the admissibility constraint holds, the A* tree search will be complete and optimal.
Proof: Assume two reachable goal states A, B are located in the search tree for a given search problem. Some ancestor n (on the optimal path)of A must be on the fringe. We claim n will be selected for expansion before B (不妨理解为A,B是能通线终点的不同两个节点, 从A,B后所有通先最终节点的路径相同,因此A,B区分了两个path). Because of the fact that
1) $g(A) < g(B)$ (The definition of optimal)
2) h(A) = h(B) = 0 (By the definition of goal state)
3) $f(n) \le f(A)$. Due to the admissibility of heuristic function, $f(n) = g(n) + h(n) \le g(n) + h^*(n) \le g(A) = f(A)$ (Note that by Bellman principle the lowest backward cost of the goal state indicate that the for any n on the optimal path $g(n) + h^*(n) = g(A)$)
Therefore, $f(n) \le f(A) \le f(B)$
By the construction of the fringe representation of the UCS, n will be visited before B.
3.2.2 A* graph search
The restriction of heuristic function to make the A* graph search optimal is called consistency. For any A, C, $h(A) - h(C) \le cost(A, C)$. For any successor $n^\prime$ of current state n, $f(n^\prime) = g(n^\prime) + h(n^\prime) = g(n) + \text{cost}(n, n^\prime) + h(n^\prime) > g(n) + f(n) = f(n)$. Therefore, the value of estimated cost on any path will be nondecreasing.
3.3 The dominance of heuristic function
For any state n, if $h_a(n) \ge h_b(n)$, then $h_a$ will be a better heuristic function compared to $h_b$, since the estimation of the total cost will be closer to the true cost.
Comments