Documente Academic
Documente Profesional
Documente Cultură
Abstract Because limited time (or other resources) prevent the agent
from performing all potentially useful deliberation (or infor-
In most real-world settings, due to limited time or
mation gathering) actions, it has to select among such actions.
other resources, an agent cannot perform all poten-
Reasoning about which deliberation actions to take is called
tially useful deliberation and information gathering
metareasoning. Decision theory [7, 10] provides a norma-
actions. This leads to the metareasoning problem
tive basis for metareasoning under uncertainty, and decision-
of selecting such actions. Decision-theoretic meth-
theoretic deliberation control has been widely studied in AI
ods for metareasoning have been studied in AI, but
(e.g., [2, 4–6, 8, 9, 12–15, 18–20]).
there are few theoretical results on the complexity
However, the approach of using metareasoning to control
of metareasoning. We derive hardness results for
reasoning is impractical if the metareasoning problem itself
three settings which most real metareasoning sys-
is prohibitively complex. While this issue is widely acknowl-
tems would have to encompass as special cases.
edged (e.g., [8, 12–14]), there are few theoretical results on
In the first, the agent has to decide how to allo-
the complexity of metareasoning.
cate its deliberation time across anytime algorithms
We derive hardness results for three central metareason-
running on different problem instances. We show
ing problems. In the first (Section 2), the agent has to de-
this to be N P-complete. In the second, the agent
cide how to allocate its deliberation time across anytime al-
has to (dynamically) allocate its deliberation or in-
gorithms running on different problem instances. We show
formation gathering resources across multiple ac-
this to be N P-complete. In the second metareasoning prob-
tions that it has to choose among. We show this
lem (Section 3), the agent has to (dynamically) allocate its de-
to be N P-hard even when evaluating each individ-
liberation or information gathering resources across multiple
ual action is extremely simple. In the third, the
actions that it has to choose among. We show this to be N P-
agent has to (dynamically) choose a limited num-
hard even when evaluating each individual action is extremely
ber of deliberation or information gathering actions
simple. In the third metareasoning problem (Section 4), the
to disambiguate the state of the world. We show
agent has to (dynamically) choose a limited number of delib-
that this is N P-hard under a natural restriction, and
eration or information gathering actions to disambiguate the
PSPACE-hard in general.
state of the world. We show that this is N P-hard under a
natural restriction, and PSPACE-hard in general.
1 Introduction These results have general applicability in that most metar-
In most real-world settings, due to limited time, an agent can- easoning systems must somehow deal with one or more of
not perform all potentially useful deliberation actions. As these problems (in addition to dealing with other issues). We
a result it will generally be unable to act rationally in the also believe that these results give a good basic overview of
world. This phenomenon, known as bounded rationality, has the space of high-complexity issues in metareasoning.
been a long-standing research topic (e.g., [3, 17]). Most of
that research has been descriptive: the goal has been to char- 2 Allocating anytime algorithm time across
acterize how agents—in particular, humans—deal with this problems
constraint. Another strand of bounded rationality research In this section we study the setting where an agent has to
has the normative (prescriptive) goal of characterizing how allocate its deliberation time across different problems—each
agents should deal with this constraint. This is particularly of which the agent can solve using an anytime algorithm. We
important when building artificial agents. show that this is hard even if the agent can perfectly predict
Characterizing how an agent should deal with bounded ra- the performance of the anytime algorithms.
tionality entails determining how the agent should deliberate.
∗
The material in this paper is based upon work supported by the 2.1 Motivating example
National Science Foundation under CAREER Award IRI-9703122, Consider a newspaper company that has, by midnight, re-
Grant IIS-9800994, ITR IIS-0081246, and ITR IIS-0121678. ceived the next day’s orders from newspaper stands in the 3
cities where the newpaper is read. The company owns a fleet problem instances to get a total performance of at least K;
of delivery trucks in each of the cities. Each fleet needs its that P a vector (N1 , N2 , . . . , Nm ) with
Pis, whether there exists
vehicle routing solution by 5am. The company has a default Ni ≤ N and fi (Ni ) ≥ K.
routing solution for each fleet, but can save costs by improv- 1≤i≤m 1≤i≤m
ing (tailoring to the day’s particular orders) the routing solu- A reasonable approach to representing the performance
tion of any individual fleet using an anytime algorithm. In this profiles is to use piecewise linear performance profiles. They
setting, the “solution quality” that the anytime algorithm pro- can model any performance profile arbitrarily closely, and
vides on a fleet’s problem instance is the amount of savings have been used in the resource-bounded reasoning litera-
compared to the default routing solution. ture to characterize the performance of anytime algorithms
We assume that the company can perfectly predict the sav- (e.g. [2]). We now show that the metareasoning problem is
ings made on a given fleet’s problem instance as a function of N P-complete even under this restriction. We will reduce
deliberation time spent on it (we will prove hardness of metar- from the KNAPSACK problem.2
easoning even in this deterministic variant). Such functions
are called (deterministic) performance profiles [2, 6, 8, 9, 20]. Definition 2 (KNAPSACK) We are given a set S of m pairs
Each fleet’s problem instance has its own performance pro- of positive integers (ci , vi ), a constraint C > 0 and a target
file.1 Suppose the performance profiles are as shown in Fig. 1. value V >P0. We are askedPwhether there exists a set I ⊆ S
such that Cj ≤ C and vj ≥ V .
4
j∈I j∈I
instance 1
instance 2
3.5 instance 3
Theorem 1 PERFORMANCE-PROFILES is N P-complete
3
even if each performance profile is continuous and piecewise
2.5
linear.3
Savings
1.5
Proof: The problem is in N P because we can nondetermin-
1
istically generate the Ni in polynomial time (since we do not
0.5
need to bother trying numbers greater than N ), and given the
0
Ni , we can verify if the target value is reached in polyno-
0 1 2 3 4
Computation Time (hours)
5 6
mial time. To show N P-hardness, we reduce an arbitrary
KNAPSACK instance to the following PERFORMANCE-
Figure 1: Performance profiles for the routing problems. PROFILES instance. Let there be m performance profiles,
given by fi (n) = 0 for n ≤ ci , fi (n) = n − ci for ci < n ≤
ci + vi , and fi (n) = vi for n > ci + vi . Let N = C + V
Then the maximum savings we can obtain with 5 hours of and let K = V . We claim the two problem instances are
deliberation time is 2.5, for instance by spending 3 hours on equivalent. First suppose there is a solutionP to the KNAP-
instance 1 and 2 on instance 2. On the other hand, if we had SACK instance, that is, a set I ⊆ S such that Cj ≤ C and
until 6am to deliberate (6 hours), we could obtain a savings
P j∈I
of 4 by spending 6 hours on instance 3. vj ≥ V . Then set the Ni as follows: Ni = 0 for i ∈ / I,
j∈I
P P
2.2 Definitions and results and Ni = ci + PV v vi for i ∈ I. Then, Ni = Ni =
j
We now define the metareasoning problem of allocating de- 1≤i≤m i∈I
P j∈I
P P P
liberation across problems according to performance profiles. (ci + PV v vi ) = ci + PV v vi = ci + V ≤
j j
i∈I i∈I i∈I i∈I
Definition 1 (PERFORMANCE-PROFILES) We are j∈I
P j∈I
P
given a list of performance profiles (f1 , f2 , . . . , fm ) (where C + V = N since Cj ≤ C. Also, since vj ≥ V , it
each fi is a nondecreasing function of deliberation time, j∈I j∈I
mapping to nonnegative real numbers), a number of de- P every i ∈ I, P
follows that for we have ci ≤ P
Ni ≤ ci + vi ,
liberation steps N , and a target value K. We are asked and hence fi (Ni ) = fi (Ni ) = ( PV v vi ) =
j
1≤i≤m i∈I i∈I
whether we can distribute the deliberation steps across the j∈I
1 2
Because the anytime algorithm’s performance differs across in- This only demonstrates weak NP-completeness, as KNAP-
stances, each instance has its own performance profile (in the set- SACK is weakly NP-complete; thus, perhaps pseudopolynomial
ting of deterministic performance profiles). In reality, an anytime time algorithms exist.
3
algorithm’s performance on an instance cannot be predicted per- If one additionally assumes that each performance profile is
fectly. Rather, usually statistical performance profiles are kept that concave, then the metareasoning problem is solvable in polynomial
aggregate across instances. In that light one might question the as- time [2]. While returns to deliberation indeed tend to be diminish-
sumption that different instances have different performance pro- ing, usually this is not the case throughout the performance profile.
files. However, sophisticated deliberation control systems can con- Algorithms often have a setup phase in the beginning during which
dition the performance prediction on features of the instance—and there is no improvement. Also, iterative improvement algorithms
this is necessary if the deliberation control is to be fully normative. can switch to using different local search operators once progress has
(Research has already been conducted on conditioning performance ceased using one operator (for example, once 2-swap has reached a
profiles on instance features [8, 9, 15] or results of deliberation on local optimum in TSP, one can switch to 3-swap and obtain gains
the instance so far [4, 8, 9, 15, 18–20].) from deliberation again) [16].
P
PV vi = V = K. So we have found a solution to is no silver, the test will be positive with probability 0. This
vj
j∈I
i∈I test takes 3 units of time. (3) Test for copper at C. If there
the PERFORMANCE-PROFILES instance. On the other is copper, the test will be positive with probability 1;if there
hand, suppose there is a solution to the PERFORMANCE- is no copper, the test will be positive with probability 0. This
PROFILES instance, thatPis, a vector (N1 , N2 , . . . , Nm ) with test takes 2 units of time.
P
Ni ≤ N and fi (Ni ) ≥ K. Since spending Given the probabilities of the tests turning out positive un-
1≤i≤m 1≤i≤m der various circumstances, one can use Bayes’ rule to com-
more than ci + vi deliberation steps on profile i is useless, pute the expected utility of each digging option given any
we may assume that Ni ≤ ci + vi for all i. We now claim (lack of) test result. For instance, letting a be the event that
that I = {i : Ni ≥ ci } is a solution to the KNAPSACK in- there is gold at A, and tA be the event that the test at A is
stance. First,
P usingP (Nj ) = 0 for all j ∈
the fact that fjP / I, positive, we observe that P (tA ) = P (tA |a)P (a) + P (tA | −
we have vi ≥ fi (Ni ) = fi (Ni ) ≥ K = V . a)P (−a) = 14 1 1 7 7
15 8 + 15 8 = 40 . Then, the expected utility of
P i∈I P i∈I P 1≤i≤mP P digging at A given that the test at A was positive is 5 P (a|tA ),
Also, ci = Ni − (Ni − ci ) = Ni − fi (Ni ) ≤ |a)P (a) 14 1
P i∈I P
i∈I i∈I i∈I i∈I where P (a|tA ) = P (tPA (t A)
= 157 8 = 23 , so the expected
Ni − fi (Ni ) ≤ N − K = C + V − V = C. So
40
1≤i≤m 1≤i≤m
utility is 10
3 . Doing a similar analysis everywhere, we can rep-
we have found a solution to the KNAPSACK instance. resent the problem by trees shown in Fig. 2. In these trees, be-