Bits on Search Theory. 
M.V. Donskoy
Institute for System Studies, Moscow, USSR

Annotation.
This paper concerns the problem of search which is central to artificial intelligence. Different methods of search are discussed and various ways of search reduction are considered. Some unusual views on the search are presented. 

1. What is search? 
The main problem which is solved on computer is the problem of looking for solutions. One frequent formulation is: there is a space X consisting of elements x. There is a goal function F(x) defined on the space X. One needs to find the element x0 such that F(x0)=max F(x) on all X. 
In a number of cases there is a possibility for analytical solution of this problem when x0 is found without looking at the various elements of X. 
For physical problems where F(x) is well behaved on X, the method of iterative search is frequently used. By this method a sequence of elements x1, x2, ... are built such that the goal function is monotonic increasing on this sequence. At some N this sequence is stopped and xN is declared to be the solution. Usually it is an approximate solution but sometimes it is quite non-optimal. 
However for complex problems in particular for AI problems, the method of hierarchical search or backtracking is usually applied. When this method is used, the space X is divided into several subspaces X1, X2, ... Xn, and so on, and the problem is solved for each of them subsequently. After the partial solutions are obtained they are compared and by this means the global solution is found. 

		 Ŀ
		  X11   X12             X
		 Ĵ    X2     
		  X13   X14            
		 Ĵ
		                           
		     X3          X4     
		                        
		  
		  Fig. 1. A search space. 

The problem in each subspace is solved by the same procedure - subspace Xi is divided into subspaces Xi1, Xi2, ... Xik, and the partial solutions are found and compared. This procedure is used recursively on all levels except those which contain only a simple element, when the problem is trivial. 
The traditional representation of hierarchical search is tree search. The root of the tree represents the whole space X. Sons of the root node represent subspaces X1, X2, ... Xn. Arcs of the tree represent divisions into subspaces, nodes - subspaces of the corresponding division level, leaf nodes - individual elements. 
			     X           
		       Ŀ    
		                           
		                              
		        X1     X2      X3      X4
		  Ŀ
		                         
		 X11     X12    X13    X14

			Fig. 2. A Search Tree.           

It is worth noting that these two views on the backtracking algorithm are equivalent, some properties of the algorithm are better seen in one representation, others in an other one. 
The backtracking procedure has two degrees of freedom: the way to subdivide a subspace and the order of the partial solutions in the subspaces. 
What is the advantage of hierarchical search in comparison with simple examination of each element in turn? Sometimes it is possible to cut off whole subspaces or subtrees because by using some information it could be shown that in a particular subspace there can not be a global solution. A cutoff is an exact one if its correctness can be proved and a heuristic one if there is no proof but there are some justification that it is correct. 

2. Cutoffs. 
In papers on hierarchical search, mainly one kind of cutoff is considered - uniform cutoffs. When using cutoffs of this kind, one shows for all elements x of subspace Xij that F(x)<=a<=F(x0). 
This  method of cutoff is called Branch-and-Bound, but for minimax search it has the name of alpha-beta pruning. The effectiveness of this method is heavily dependent on the order of searching subspaces, but nevertheless it has be shown that a significant amount of search needs to be carried out anyway. 
This leads one to think of looking for essentially different kinds of cutoffs. One such cutoff method is the domination method. When using it, one tries to divide subspace Xij into parts Xijk and for each part tries to show a subspace Yk such that for each element x from Xijk there is an element y from Yk for which F(x)<=F(y). In this way, the goal function on Xijk is dominated by the goal function on Yk and searching of Xij is redundant. 
		 Ŀ
		  X11   X12 ͻ        X
		 Ķ Y3 X2    
		  X13   ͼ       
		 ͻĴ
		          Y1              
		     X3  ͼ    X4 ͻ
		                     Y2
		 ͼ
			Fig. 3. Domination. 
		X14 is dominates by Y1, Y2  Y3

In the case when for all Yk the problem is already solved, this method can be formally reduced to Branch-and-Bound uniform cutoffs, but in some cases one succeeds in dominating by subspaces which were not yet searched. In that case the reduction to uniform cutoffs is impossible. For minimax tree search (and hence for and-or tree search too), the domination method is essentially different from alpha-beta: alpha-beta works on the "good" nodes, and for example, the method of analogies built by the domination principle, is works on "bad" nodes. 
Using domination methods assumes a quite different problem model than usually stated. For uniform cutoffs it is sufficient to have a tree with the values of leaf nodes. It is proved that for such a model the uniform cutoffs are the only possible cutoffs. For using different kind of cutoffs one needs to involve some supplementary structure in the search tree. 

3. The structuring of the search tree. 
There are two basic ways to involve supplementary structure in the search tree - node classification and arc classification. Under node classification some nodes can be defined as belonging to the same class and to be in that sense equivalent. The most frequent case is the same state of an object reached by different ways, usually because of interchanging actions . I would like to stress that from the point of view of the traditional classic model that equivalence is absolute nonsense because in a tree all nodes are different, but for every particular problem this equivalence is intuitively evident. 
When using arc classification different arcs in the tree are declared to be equivalent. After that, in a natural way the equality of paths and equality and inclusion of subtrees are defined. Then the method of domination subspaces can be applied to the subtrees inclusion. It is worth noting that domination subtrees are defined by problem's structure and can exist in still not searched parts of the tree. 
As an example, the classic travelling salesman problem can be considered. If cities are divided into two islands (two groups such that the distance between any two cities in the same is much smaller than the distance between any two cities from different) then a route returning to the initial island before visiting all the cities of another island is dominated by any route which completes visiting all the cities of another island before returning to initial one, and this fact does not depend on whether any of them were already examinated in the search or not. 
In arc classification the very important notion arises - a new arc. Frequently the set of arcs from two nodes are almost equal but with small exceptions. Very important for cutoffs in that case are those arcs that exist in the current node and do not exist in another. We will return to this notion later. 

4. Heuristic function.
One of the main means of search ordering is the so called heuristic function the value of which defines the order of nodes search. In essence it is a function on arcs because it evaluates the most promising way to quickly find the solution. In early works on search the partial goal function was frequently used as a heuristic function. In travelling salesman problems it was a partial route cost in game playing programs - the static evaluation of the position corresponding to the node.
After some time it became clear that it is better to use as a heuristic function a function which is essentially different from the partial goal function. I would like to give an explanation to this fact.
As the matter of fact a heuristic function contains knowledge about the problem that is being solved by the search. If this knowledge were very precise (which would means in fact that it would cease to be heuristic) then the solution would be found as the first visited leaf node. Why in that case does one need to search other nodes? Not for finding the solution, but for proving that the first leaf node is the solution. Which means that the search is a kind of insurance against bad quality of the heuristic function.
In that case the following behaviour seems to be inconsistent. If the heuristic function is ideal the search is redundant and if it has a small flaw one needs exhaustive search. This consideration can be used for (heuristic) search reduction by involving the degree of confidence in the heuristic function.
Let us define the function of the "likelihood to be a solution" for all nodes in the tree. It will be the sum of all numbers of arcs on the path from the tree root to the node. The number of an arc is the number in the search ordering of arcs defined by the heuristic function. If this sum is too large, then it is very improbable that the node is on the way to a solution.
For minimax trees one needs to take the absolute value of the difference between numbers of arcs for the MAX side and for the MIN side.
Let us formulate the new stopping rule - "confidence stop rule". If the "likelihood to be a solution" function exceeds some threshold which reflects our confidence in the heuristic function, then one may cut off the current node. To be on the safe side, the new arcs can be nevertheless included in the search.
In contrast to the two kinds of cutoffs discussed earlier, this cutoff can sometimes lead to a non-optimal solution and sometimes to the wrong one. But it is precisely the question of confidence.
What does one need to have in a heuristic function which is not a part of the goal function?

5. The model game.
Let us consider the following game. It is essentially a heads-and-tails game but with some new rules. If the coin falls tails one loses, let us say, a dollar. If the coin falls heads he has a possibility to choose whether to win a dollar or to lose one. After this three different positions arise. Different in the sense that the coin will fall in future depending on past falls and choices made.
The player has an obligation to play N rounds of this game, which means that he can not stop the game before N falls of the coin. On the other hand he has the true information on all the coin's falls in all positions up to n rounds. What is the best algorithm to play this game?
Let us assume that in the first round the coin falls heads. According to the standard theory to make a choice between taking and giving up the dollar one needs to search which of these two possibilities leads to the maximum gain (or minimum loss) in n rounds and to make the move accordingly. However it is possible to show that this strategy is not the best one. It would be the best without the obligation to play for N rounds. But under this obligation the matter is more complicated.
Let us assume that after taking the dollar, the player has the only way to a maximum gain of k dollars, but by giving up the dollar he has several different ways (due to possibility to choose later in the next n rounds) to the gain of k-2 dollars. Prof. A.L. Brudno who invented the game for research reasons has shown that for some values of N and n it is more preferable to go by the way which is not leading to the maximum gain. The heart of the matter is that sometimes on the maximum gain path, the player will have no choice later and when further on this path he will see that this path is bad, but he could not do anything with that. But on suboptimal ways if he conserves the right to choose, he will have an opportunity to avoid bad paths later when they will be in his view.
Hence instead of maximum gain the player needs to use a different heuristic function which takes into account not only the amount of gain but also the number of different ways to get this amount which means conserving the freedom of choice.
This example is the answer to the question about how the heuristic function should differ from the partial goal function. It needs to take into account the freedom of action. In particular, the new arks need to have bigger heuristic function than the old arks with the same advantages.

Acknowledgments.
This paper is the result of the long work on KAISSA chess problem and discussions connected with the work. So I am indebted to every member of the KAISSA team, especially Prof. G.M. AdelsonVelsky and Prof. V.L. Arlazarov. Very fruitful discussions on the matter were held in the University of Alberta with Prof. T.A. Marsland and Dr. J. Schaeffer. Specials thanks to Prof. W. Armstrong for his valuable help in preparing this paper, which included lessons in the Computer Science English.
References.
1. G.M. Adelson-Velsky, V.L. Arlazarov, M.V. Donskoy. Game Algorithms. Springer Verlag 1988.
2. T.A. Marsland. Computer Chess Methods (In the Encyclopedia of Artificial Intelligence ed. by S. Shapiro). John Wiley & Sons 1987. (Contains good bibliography).
3. A.L. Brudno. Talk in the Computer Science Seminar in Moscow State University organized by A.S. Kronrod, Moscow 1975. 

1) This paper is made as a part of collaborated work between the Institute for System Studies and the University of Alberta, Edmonton, Canada on Artificial Intelligence and Data Bases.
