This is the html version of the file http://www-formal.stanford.edu/jmc/ailogic.pdf. G o o g l e automatically generates html versions of documents as we crawl the web. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:ebrNIDCyBJgJ:www-formal.stanford.edu/jmc/ailogic.pdf+%22artificial+intelligence%22+%22common+sense%22+site:edu+pdf&hl=en&client=firefox-a Google is not affiliated with the authors of this page nor responsible for its content. These search terms have been highlighted: artificial intelligence common sense These terms only appear in links pointing to this page: pdf Page 1 ARTIFICIAL INTELLIGENCE, LOGIC AND FORMALIZING COMMON SENSE John McCarthy Computer Science Department Stanford University Stanford, CA 94305 jmc@cs.stanford.edu http://www-formal.stanford.edu/jmc/ 1990 1 Introduction This is a position paper about the relations among artificial intelligence (AI), mathematical logic and the formalization of common-sense knowledge and reasoning. It also treats other problems of concern to both AI and philosophy. I thank the editor for inviting it. The position advocated is that philosophy can contribute to AI if it treats some of its traditional subject matter in more detail and that this will advance the philosophical goals also. Actual formalisms (mostly first order languages) for expressing common-sense facts are described in the references. Common-sense knowledge includes the basic facts about events (including actions) and their effects, facts about knowledge and how it is obtained, facts about beliefs and desires. It also includes the basic facts about material objects and their properties. One path to human-level AI uses mathematical logic to formalize common- sense knowledge in such a way that common-sense problems can be solved by logical reasoning. This methodology requires understanding the common- sense world well enough to formalize facts about it and ways of achieving goals in it. Basing AI on understanding the common-sense world is different 1 Page 2 from basing it on understanding human psychology or neurophysiology. This approach to AI, based on logic and computer science, is complementary to approaches that start from the fact that humans exhibit intelligence, and that explore human psychology or human neurophysiology. This article discusses the problems and difficulties, the results so far, and some improvements in logic and logical languages that may be required to formalize common sense. Fundamental conceptual advances are almost cer- tainly required. The object of the paper is to get more help for AI from philosophical logicians. Some of the requested help will be mostly philosoph- ical and some will be logical. Likewise the concrete AI approach may fertilize philosophical logic as physics has repeatedly fertilized mathematics. There are three reasons for AI to emphasize common-sense knowledge rather than the knowledge contained in scientific theories. (1) Scientific theories represent compartmentalized knowledge. In pre- senting a scientific theory, as well as in developing it, there is a common-sense pre-scientific stage. In this stage, it is decided or just taken for granted what phenomena are to be covered and what is the relation between certain formal terms of the theory and the common-sense world. Thus in classical mechan- ics it is decided what kinds of bodies and forces are to be used before the differential equations are written down. In probabilistic theories, the sample space is determined. In theories expressed in first order logic, the predicate and function symbols are decided upon. The axiomatic reasoning techniques used in mathematical and logical theories depend on this having been done. However, a robot or computer program with human-level intelligence will have to do this for itself. To use science, common sense is required. Once developed, a scientific theory remains imbedded in common sense. To apply the theory to a specific problem, common-sense descriptions must be matched to the terms of the theory. For example, d = 1 2 gt 2 does not in itself identify d as the distance a body falls in time t and identify g as the acceleration due to gravity. (McCarthy and Hayes 1969) uses the situation calculus discussed in that paper to imbed the above formula in a formula describing the common-sense situation, for example dropped(x,s) ∧ height(x,s) = h ∧ d = 1 2 gt 2 ∧ d < h ⊃ ∃s (F(s,s ) ∧ time(s ) = time(s) + t ∧ height(x,s ) = h − d). (1) Here x is the falling body, and we are presuming a language in which 2 Page 3 the functions height, time, etc. are formalized in a way that corresponds to what the English words suggest. s and s denote situations as discussed in that paper, and F(s,s ) asserts that the situation s is in the future of the situation s. (2) Common-sense reasoning is required for solving problems in the common- sense world. From the problem solving or goal-achieving point of view, the common-sense world is characterized by a different informatic situation than that within any formal scientific theory. In the typical common-sense infor- matic situation, the reasoner doesn’t know what facts are relevant to solving his problem. Unanticipated obstacles may arise that involve using parts of his knowledge not previously thought to be relevant. (3) Finally, the informal metatheory of any scientific theory has a common- sense informatic character. By this I mean the thinking about the structure of the theory in general and the research problems it presents. Mathematicians invented the concept of a group in order to make previously vague parallels between different domains into a precise notion. The thinking about how to do this had a common-sense character. It might be supposed that the common-sense world would admit a con- ventional scientific theory, e.g. a probabilistic theory. But no one has yet developed such a theory, and AI has taken a somewhat different course that involves nonmonotonic extensions to the kind of reasoning used in formal scientific theories. This seems likely to work better. Aristotle, Leibniz, Boole and Frege all included common-sense knowledge when they discussed formal logic. However, formalizing much of common- sense knowledge and reasoning proved elusive, and the twentieth century emphasis has been on formalizing mathematics. Some important philoso- phers, e.g. Wittgenstein, have claimed that common-sense knowledge is un- formalizable or mathematical logic is inappropriate for doing it. Though it is possible to give a kind of plausibility to views of this sort, it is much less easy to make a case for them that is well supported and carefully worked out. If a common-sense reasoning problem is well presented, one is well on the way to formalizing it. The examples that are presented for this negative view bor- row much of their plausibility from the inadequacy of the specific collections of predicates and functions they take into consideration. Some of their force comes from not formalizing nonmonotonic reasoning, and some may be due to lack of logical tools still to be discovered. While I acknowledge this opin- ion, I haven’t the time or the scholarship to deal with the full range of such arguments. Instead I will present the positive case, the problems that have 3 Page 4 arisen, what has been done and the problems that can be foreseen. These problems are often more interesting than the ones suggested by philosophers trying to show the futility of formalizing common sense, and they suggest productive research programs for both AI and philosophy. In so far as the arguments against the formalizability of common-sense attempt to make precise intuitions of their authors, they can be helpful in identifying problems that have to be solved. For example, Hubert Dreyfus (1972) said that computers couldn’t have “ambiguity tolerance” but didn’t offer much explanation of the concept. With the development of nonmono- tonic reasoning, it became possible to define some forms of ambiguity toler- ance and show how they can and must be incorporated in computer systems. For example, it is possible to make a system that doesn’t know about possi- ble de re/de dicto ambiguities and has a default assumption that amounts to saying that a reference holds both de re and de dicto. When this assumption leads to inconsistency, the ambiguity can be discovered and treated, usually by splitting a concept into two or more. If a computer is to store facts about the world and reason with them, it needs a precise language, and the program has to embody a precise idea of what reasoning is allowed, i.e. of how new formulas may be derived from old. Therefore, it was natural to try to use mathematical logical languages to express what an intelligent computer program knows that is relevant to the problems we want it to solve and to make the program use logical inference in order to decide what to do. (McCarthy 1959) contains the first proposals to use logic in AI for expressing what a program knows and how it should reason. (Proving logical formulas as a domain for AI had already been studied by several authors). The 1959 paper said: The advice taker is a proposed program for solving problems by manipulating sentences in formal languages. The main differ- ence between it and other programs or proposed programs for ma- nipulating formal languages (the Logic Theory Machine of Newell, Simon and Shaw and the Geometry Program of Gelernter) is that in the previous programs the formal system was the subject mat- ter but the heuristics were all embodied in the program. In this program the procedures will be described as much as possible in the language itself and, in particular, the heuristics are all so described. 4 Page 5 The main advantages we expect the advice taker to have is that its behavior will be improvable merely by making state- ments to it, telling it about its symbolic environment and what is wanted from it. To make these statements will require little if any knowledge of the program or the previous knowledge of the advice taker. One will be able to assume that the advice taker will have available to it a fairly wide class of immediate logical consequences of anything it is told and its previous knowledge. This property is expected to have much in common with what makes us describe certain humans as having common sense. We shall therefore say that a program has common sense if it auto- matically deduces for itself a sufficiently wide class of immediate consequences of anything it is told and what it already knows. The main reasons for using logical sentences extensively in AI are better understood by researchers today than in 1959. Expressing information in declarative sentences is far more modular than expressing it in segments of computer program or in tables. Sentences can be true in much wider contexts than specific programs can be useful. The supplier of a fact does not have to understand much about how the receiver functions, or how or whether the receiver will use it. The same fact can be used for many purposes, because the logical consequences of collections of facts can be available. The advice taker prospectus was ambitious in 1959, would be considered ambitious today and is still far from being immediately realizable. This is especially true of the goal of expressing the heuristics guiding the search for a way to achieve the goal in the language itself. The rest of this paper is largely concerned with describing what progress has been made, what the obstacles are, and how the prospectus has been modified in the light of what has been discovered. The formalisms of logic have been used to differing extents in AI. Most of the uses are much less ambitious than the proposals of (McCarthy 1959). We can distinguish four levels of use of logic. 1. A machine may use no logical sentences—all its “beliefs” being implicit in its state. Nevertheless, it is often appropriate to ascribe beliefs and goals to the program, i.e. to remove the above sanitary quotes, and to use a principle of rationality—It does what it thinks will achieve its goals. Such ascription is discussed from somewhat different points of view in (Dennett 1971), (McCarthy 1979a) and (Newell 1981). The advantage is that the intent 5 Page 6 of the machine’s designers and the way it can be expected to behave may be more readily described intentionally than by a purely physical description. The relation between the physical and the intentional descriptions is most readily understood in simple systems that admit readily understood descrip- tions of both kinds, e.g. thermostats. Some finicky philosophers object to this, contending that unless a system has a full human mind, it shouldn’t be regarded as having any mental qualities at all. This is like omitting the num- bers 0 and 1 from the number system on the grounds that numbers aren’t required to count sets with no elements or one element. Indeed if your main interest is the null set or unit sets, numbers are irrelevant. However, if your interest is the number system you lose clarity and uniformity if you omit 0 and 1. Likewise, when one studies phenomena like belief, e.g. because one wants a machine with beliefs and which reasons about beliefs, it works better not to exclude simple cases from the formalism. One battle has been over whether it should be forbidden to ascribe to a simple thermostat the belief that the room is too cold. (McCarthy 1979a) says much more about ascribing mental qualities to machines, but that’s not where the main action is in AI. 2. The next level of use of logic involves computer programs that use sentences in machine memory to represent their beliefs but use other rules than ordinary logical inference to reach conclusions. New sentences are often obtained from the old ones by ad hoc programs. Moreover, the sentences that appear in memory belong to a program-dependent subset of the logical language being used. Adding certain true sentences in the language may even spoil the functioning of the program. The languages used are often rather unexpressive compared to first order logic, for example they may not admit quantified sentences, or they may use a different notation from that used for ordinary facts to represent “rules”, i.e. certain universally quantified implication sentences. Most often, conditional rules are used in just one direction, i.e. contrapositive reasoning is not used. Usually the program cannot infer new rules; rules must have all been put in by the “knowledge engineer”. Sometimes programs have this form through mere ignorance, but the usual reason for the restriction is the practical desire to make the program run fast and deduce just the kinds of conclusions its designer anticipates. We believe the need for such specialized inference will turn out to be temporary and will be reduced or eliminated by improved ways of controlling general inference, e.g. by allowing the heuristic rules to be also expressed as sentences 6 Page 7 as promised in the above extract from the 1959 paper. 3. The third level uses first order logic and also logical deduction. Typ- ically the sentences are represented as clauses, and the deduction methods are based on J. Allen Robinson’s (1965) method of resolution. It is common to use a theorem prover as a problem solver, i.e. to determine an x such that P(x) as a byproduct of a proof of the formula ∃xP(x). This level is less used for practical purposes than level two, because techniques for controlling the reasoning are still insufficiently developed, and it is common for the program to generate many useless conclusions before reaching the desired solution. Indeed, unsuccessful experience (Green 1969) with this method led to more restricted uses of logic, e.g. the STRIPS system of (Nilsson and Fikes 1971). The commercial “expert system shells”, e.g. ART, KEE and OPS-5, use logical representation of facts, usually ground facts only, and separate facts from rules. They provide elaborate but not always adequate ways of controlling inference. In this connection it is important to mention logic programming, first introduced in Microplanner (Sussman et al., 1971) and from different points of view by Robert Kowalski (1979) and Alain Colmerauer in the early 1970s. A recent text is (Sterling and Shapiro 1986). Microplanner was a rather unsystematic collection of tools, whereas Prolog relies almost entirely on one kind of logic programming, but the main idea is the same. If one uses a restricted class of sentences, the so-called Horn clauses, then it is possible to use a restricted form of logical deduction. The control problem is then much eased, and it is possible for the programmer to anticipate the course the deduction will take. The price paid is that only certain kinds of facts are conveniently expressed as Horn clauses, and the depth first search built into Prolog is not always appropriate for the problem. Even when the relevant facts can be expressed as Horn clauses supple- mented by negation as failure, the reasoning carried out by a Prolog program may not be appropriate. For example, the fact that a sealed container is ster- ile if all the bacteria in it are dead and the fact that heating a can kills a bacterium in the can are both expressible as Prolog clauses. However, the resulting program for sterilizing a container will kill each bacterium individ- ually, because it will have to index over the bacteria. It won’t reason that heating the can kills all the bacteria at once, because it doesn’t do universal generalization. Here’s a Prolog program for testing whether a container is sterile. The 7 Page 8 predicate symbols have obvious meanings. not(P) :- P, !, fail. not(P). sterile(X) :- not(nonsterile(X)). nonsterile(X) :- bacterium(Y), in(Y,X), not(dead(Y)). hot(Y) :- in(Y,X), hot(X). dead(Y) :- bacterium(Y), hot(Y). bacterium(b1). bacterium(b2). bacterium(b3). bacterium(b4). in(b1,c1). in(b2,c1). in(b3,c2). in(b4,c2). hot(c1). Giving Prolog the goal sterile(c1) and sterile(c2) gives the answers yes and no respectively. However, Prolog has indexed over the bacteria in the containers. The following is a Prolog program that can verify whether a sequence of actions, actually just heating it, will sterilize a container. It involves introducing situations analogous to those discussed in (McCarthy and Hayes 1969). not(P) :- P, !, fail. not(P). sterile(X,S) :- not(nonsterile(X,S)). nonsterile(X,S) :- bacterium(Y), in(Y,X), not(dead(Y,S)). hot(Y,S) :- in(Y,X), hot(X,S). dead(Y,S) :- bacterium(Y), hot(Y,S). bacterium(b1). bacterium(b2). bacterium(b3). bacterium(b4). in(b1,c1). in(b2,c1). in(b3,c2). in(b4,c2). 8 Page 9 hot(C,result(heat(C),S)). When the program is given the goals sterile(c1,heat(c1,s0)) and sterile(c2,heat(c1,s0)) it answers yes and no respectively. However, if it is given the goal sterile(c1,s), it will fail because Prolog lacks what logic programmers call “constructive negation”. The same facts as are used in the first Prolog program can be expressed in in a first order language as follows. (∀X)(sterile(X) ≡ (∀Y )(bacterium(Y ) ∧ in(Y,X) ⊃ dead(Y ))), (∀XY )(hot(X) ∧ in(Y,X) ⊃ hot(Y )), (∀Y )(bacterium(Y ) ∧ hot(Y ) ⊃ dead(Y )), and hot(a). However, from them we can prove sterile(a) without having to index over the bacteria. Expressibility in Horn clauses, whether supplemented by negation as fail- ure or not, is an important property of a set of facts and logic programming has been successfully used for many applications. However, it seems unlikely to dominate AI programming as some of its advocates hope. Although third level systems express both facts and rules as logical sen- tences, they are still rather specialized. The axioms with which the programs begin are not general truths about the world but are sentences whose mean- ing and truth is limited to the narrow domain in which the program has to act. For this reason, the “facts” of one program usually cannot be used in a database for other programs. 4. The fourth level is still a goal. It involves representing general facts about the world as logical sentences. Once put in a database, the facts can be used by any program. The facts would have the neutrality of purpose characteristic of much human information. The supplier of information would not have to understand the goals of the potential user or how his mind works. The present ways of “teaching” computer programs by modifying them or directly modifying their databases amount to “education by brain surgery”. A key problem for achieving the fourth level is to develop a language for a general common-sense database. This is difficult, because the common-sense 9 Page 10 informatic situation is complex. Here is a preliminary list of features and considerations. 1. Entities of interest are known only partially, and the information about entities and their relations that may be relevant to achieving goals cannot be permanently separated from irrelevant information. (Contrast this with the situation in gravitational astronomy in which it is stated in the informal introduction to a lecture or textbook that the chemical composition and shape of a body are irrelevant to the theory; all that counts is the body’s mass, and its initial position and velocity.) Even within gravitational astronomy, non-equational theories arise and relevant information may be difficult to determine. For example, it was recently proposed that periodic extinctions discovered in the paleontological record are caused by showers of comets induced by a companion star to the sun that encounters and disrupts the Oort cloud of comets every time it comes to perihelion. This theory is qualitative because neither the orbit of the hypothetical star nor those of the comets are available. 2. The formalism has to be epistemologically adequate, a notion intro- duced in (McCarthy and Hayes 1969). This means that the formalism must be capable of representing the information that is actually available, not merely capable of representing actual complete states of affairs. For example, it is insufficient to have a formalism that can represent the positions and velocities of the particles in a gas. We can’t obtain that information, our largest computers don’t have the memory to store it even if it were available, and our fastest computers couldn’t use the information to make predictions even if we could store it. As a second example, suppose we need to be able to predict someone’s behavior. The simplest example is a clerk in a store. The clerk is a complex individual about whom a customer may know little. However, the clerk can usually be counted on to accept money for articles brought to the counter, wrap them as appropriate and not protest when the customer then takes the articles from the store. The clerk can also be counted on to object if the customer attempts to take the articles without paying the appropriate price. Describing this requires a formalism capable of representing infor- mation about human social institutions. Moreover, the formalism must be capable of representing partial information about the institution, such as a three year old’s knowledge of store clerks. For example, a three year old doesn’t know the clerk is an employee or even what that means. He doesn’t 10 Page 11 require detailed information about the clerk’s psychology, and anyway this information is not ordinarily available. The following sections deal mainly with the advances we see as required to achieve the fourth level of use of logic in AI. 2 Formalized Nonmonotonic Reasoning It seems that fourth level systems require extensions to mathematical logic. One kind of extension is formalized nonmonotonic reasoning, first proposed in the late 1970s (McCarthy 1977, 1980, 1986), (Reiter 1980), (McDermott and Doyle 1980), (Lifschitz 1989a). Mathematical logic has been monotonic in the following sense. If we have A p and A ⊂ B, then we also have B p. If the inference is logical deduction, then exactly the same proof that proves p from A will serve as a proof from B. If the inference is model- theoretic, i.e. p is true in all models of A, then p will be true in all models of B, because the models of B will be a subset of the models of A. So we see that the monotonic character of traditional logic doesn’t depend on the details of the logical system but is quite fundamental. While much human reasoning is monotonic, some important human common- sense reasoning is not. We reach conclusions from certain premisses that we would not reach if certain other sentences were included in our premisses. For example, if I hire you to build me a bird cage, you conclude that it is appropriate to put a top on it, but when you learn the further fact that my bird is a penguin you no longer draw that conclusion. Some people think it is possible to try to save monotonicity by saying that what was in your mind was not a general rule about birds flying but a probabilistic rule. So far these people have not worked out any detailed epistemology for this ap- proach, i.e. exactly what probabilistic sentences should be used. Instead AI has moved to directly formalizing nonmonotonic logical reasoning. Indeed it seems to me that when probabilistic reasoning (and not just the axiomatic basis of probability theory) has been fully formalized, it will be formally nonmonotonic. Nonmonotonic reasoning is an active field of study. Progress is often driven by examples, e.g. the Yale shooting problem (Hanks and McDer- mott 1986), in which obvious axiomatizations used with the available rea- soning formalisms don’t seem to give the answers intuition suggests. One direction being explored (Moore 1985, Gelfond 1987, Lifschitz 1989a) in- 11 Page 12 volves putting facts about belief and knowledge explicitly in the axioms —even when the axioms concern nonmental domains. Moore’s classical ex- ample (now 4 years old) is “If I had an elder brother I’d know it.” Kraus and Perlis (1988) have proposed to divide much nonmonotonic rea- soning into two steps. The first step uses Perlis’s (1988) autocircumscription to get a second order formula characterizing what is possible. The second step involves default reasoning to choose what is normally to be expected out of the previously established possibilities. This seems to be a promising approach. (Ginsberg 1987) collects the main papers up to 1986. Lifschitz (1989c) summarizes some example research problems of nonmonotonic reasoning. 3 Some Formalizations and their Problems (McCarthy 1986) discusses several formalizations, proposing those based on nonmonotonic reasoning as improvements of earlier ones. Here are some. 1. Inheritance with exceptions. Birds normally fly, but there are excep- tions, e.g. ostriches and birds whose feet are encased in concrete. The first exception might be listed in advance, but the second has to be derived or verified when mentioned on the basis of information about the mechanism of flying and the properties of concrete. There are many ways of nonmonotonically axiomatizing the facts about which birds can fly. The following axioms using a predicate ab standing for “abnormal” seem to me quite straightforward. (1) (∀x)(¬ab(aspect1(x)) ⊃ ¬flies(x)) Unless an object is abnormal in aspect1, it can’t fly. It wouldn’t work to write ab(x) instead of ab(aspect1(x)), because we don’t want a bird that is abnormal with respect to its ability to fly to be automatically abnormal in other respects. Using aspects limits the effects of proofs of abnormality. (2) (∀x)(bird(x) ⊃ ab(aspect1(x))). (3) (∀x)(bird(x) ∧ ¬ab(aspect2(x)) ⊃ flies(x)). Unless a bird is abnormal in aspect2, it can fly. When these axioms are combined with other facts about the problem, the predicate ab is then to be circumscribed, i.e. given its minimal extent 12 Page 13 compatible with the facts being taken into account. This has the effect that a bird will be considered to fly unless other axioms imply that it is abnormal in aspect2. (2) is called a cancellation of inheritance axiom, because it explicitly cancels the general presumption that objects don’t fly. This approach works fine when the inheritance hierarchy is given explicitly. More elaborate approaches, some of which are introduced in (McCarthy 1986) and improved in (Haugh 1988), are required when hierarchies with indefinite numbers of sorts are considered. 2. (McCarthy 1986) contains a similar treatment of the effects of actions like moving and painting blocks using the situation calculus. Moving and painting are axiomatized entirely separately, and there are no axioms saying that moving a block doesn’t affect the positions of other blocks or the colors of blocks. A general “common-sense law of inertia” (∀pes)(holds(p,s) ∧ ¬ab(aspect1(p,e,s)) ⊃ holds(p,result(e,s))), (2) asserts that a fact p that holds in a situation s is presumed to hold in the situation result(e,s) that results from an event e unless there is evidence to the contrary. Unfortunately, Lifschitz (1985 personal communication) and Hanks and McDermott (1986) showed that simple treatments of the common-sense law of inertia admit unintended models. Several authors have given more elaborate treatments, but in my opinion, the results are not yet entirely satisfactory. The best treatment so far seems to be that of (Lifschitz 1987). 4 Ability, Practical Reason and Free Will An AI system capable of achieving goals in the common-sense world will have to reason about what it and other actors can and cannot do. For concreteness, consider a robot that must act in the same world as people and perform tasks that people give it. Its need to reason about its abilities puts the traditional philosophical problem of free will in the following form. What view shall we build into the robot about its own abilities, i.e. how shall we make it reason about what it can and cannot do? (Wishing to avoid begging any questions, by reason we mean compute using axioms, observation sentences, rules of inference and nonmonotonic rules of conjecture.) 13 Page 14 Let A be a task we want the robot to perform, and let B and C be alternate intermediate goals either of which would allow the accomplishment of A. We want the robot to be able to choose between attempting B and attempting C. It would be silly to program it to reason: “I’m a robot and a deterministic device. Therefore, I have no choice between B and C. What I will do is determined by my construction.” Instead it must decide in some way which of B and C it can accomplish. It should be able to conclude in some cases that it can accomplish B and not C, and therefore it should take B as a subgoal on the way to achieving A. In other cases it should conclude that it can accomplish either B or C and should choose whichever is evaluated as better according to the criteria we provide it. (McCarthy and Hayes 1969) proposes conditions on the semantics of any formalism within which the robot should reason. The essential idea is that what the robot can do is determined by the place the robot occupies in the world—not by its internal structure. For example, if a certain sequence of outputs from the robot will achieve B, then we conclude or it concludes that the robot can achieve B without reasoning about whether the robot will actually produce that sequence of outputs. Our contention is that this is approximately how any system, whether human or robot, must reason about its ability to achieve goals. The basic formalism will be the same, regardless of whether the system is reasoning about its own abilities or about those of other systems including people. The above-mentioned paper also discusses the complexities that come up when a strategy is required to achieve the goal and when internal inhibitions or lack of knowledge have to be taken into account. 5 Three Approaches to Knowledge and Belief Our robot will also have to reason about its own knowledge and that of other robots and people. This section contrasts the approaches to knowledge and belief character- istic of philosophy, philosophical logic and artificial intelligence. Knowledge and belief have long been studied in epistemology, philosophy of mind and in philosophical logic. Since about 1960, knowledge and belief have also been studied in AI. (Halpern 1986) and (Vardi 1988) contain recent work, mostly oriented to computer science including AI. It seems to me that philosophers have generally treated knowledge and 14 Page 15 belief as complete natural kinds. According to this view there is a fact to be discovered about what beliefs are. Moreover, once it is decided what the objects of belief are (e.g. sentences or propositions), the definitions of belief ought to determine for each such object p whether the person believes it or not. This last is the completeness mentioned above. Of course, only human and sometimes animal beliefs have mainly been considered. Philosophers have differed about whether machines can ever be said to have beliefs, but even those who admit the possibility of machine belief consider that what beliefs are is to be determined by examining human belief. The formalization of knowledge and belief has been studied as part of philosophical logic, certainly since Hintikka’s book (1964), but much of the earlier work in modal logic can be seen as applicable. Different logics and axioms systems sometimes correspond to the distinctions that less formal philosophers make, but sometimes the mathematics dictates different dis- tinctions. AI takes a different course because of its different objectives, but I’m inclined to recommend this course to philosophers also, partly because we want their help but also because I think it has philosophical advantages. The first question AI asks is: Why study knowledge and belief at all? Does a computer program solving problems and achieving goals in the common- sense world require beliefs, and must it use sentences about beliefs? The an- swer to both questions is approximately yes. At least there have to be data structures whose usage corresponds closely to human usage in some cases. For example, a robot that could use the American air transportation system has to know that travel agents know airline schedules, that there is a book (and now a computer accessible database) called the OAG that contains this information. If it is to be able to plan a trip with intermediate stops it has to have the general information that the departure gate from an intermediate stop is not to be discovered when the trip is first planned but will be avail- able on arrival at the intermediate stop. If the robot has to keep secrets, it has to know about how information can be obtained by inference from other information, i.e. it has to have some kind of information model of the people from whom it is to keep the secrets. However, none of this tells us that the notions of knowledge and belief to be built into our computer programs must correspond to the goals philoso- phers have been trying to achieve. For example, the difficulties involved in building a system that knows what travel agents know about airline schedules are not substantially connected with questions about how the travel agents 15 Page 16 can be absolutely certain. Its notion of knowledge doesn’t have to be com- plete; i.e. it doesn’t have to determine in all cases whether a person is to be regarded as knowing a given proposition. For many tasks it doesn’t have to have opinions about when true belief doesn’t constitute knowledge. The designers of AI systems can try to evade philosophical puzzles rather than solve them. Maybe some people would suppose that if the question of certainty is avoided, the problems formalizing knowledge and belief become straightfor- ward. That has not been our experience. As soon as we try to formalize the simplest puzzles involving knowledge, we encounter difficulties that philosophers have rarely if ever attacked. Consider the following puzzle of Mr. S and Mr. P. Two numbers m and n are chosen such that 2 ≤ m ≤ n ≤ 99. Mr. S is told their sum and Mr. P is told their product. The following dialogue ensues: Mr. P: I don’t know the numbers. Mr. S: I knew you didn’t know them. I don’t know them either. Mr. P: Now I know the numbers. Mr. S: Now I know them too. In view of the above dialogue, what are the numbers? Formalizing the puzzle is discussed in (McCarthy 1989). For the present we mention only the following aspects. 1. We need to formalize knowing what, i.e. knowing what the numbers are, and not just knowing that. 2. We need to be able to express and prove non-knowledge as well as knowledge. Specifically we need to be able to express the fact that as far as Mr. P knows, the numbers might be any pair of factors of the known product. 3. We need to express the joint knowledge of Mr. S and Mr. P of the conditions of the problem. 4. We need to express the change of knowledge with time, e.g. how Mr. P’s knowledge changes when he hears Mr. S say that he knew that Mr. P didn’t know the numbers and doesn’t know them himself. This includes inferring what Mr. S and Mr. P still won’t know. 16 Page 17 The first order language used to express the facts of this problem involves an accessibility relation A(w1,w2,p,t), modeled on Kripke’s semantics for modal logic. However, the accessibility relation here is in the language itself rather than in a metalanguage. Here w1 and w2 are possible worlds, p is a person and t is an integer time. The use of possible worlds makes it convenient to express non-knowledge. Assertions of non-knowledge are expressed as the existence of accessible worlds satisfying appropriate conditions. The problem was successfully expressed in the language in the sense that an arithmetic condition determining the values of the two numbers can be de- duced from the statement. However, this is not good enough for AI. Namely, we would like to include facts about knowledge in a general purpose common- sense database. Instead of an ad hoc formalization of Mr. S and Mr. P, the problem should be solvable from the same general facts about knowledge that might be used to reason about the knowledge possessed by travel agents supplemented only by the facts about the dialogue. Moreover, the language of the general purpose database should accommodate all the modalities that might be wanted and not just knowledge. This suggests using ordinary logic, e.g. first order logic, rather than modal logic, so that the modalities can be ordinary functions or predicates rather than modal operators. Suppose we are successful in developing a “knowledge formalism” for our common-sense database that enables the program controlling a robot to solve puzzles and plan trips and do the other tasks that arise in the common-sense environment requiring reasoning about knowledge. It will surely be asked whether it is really knowledge that has been formalized. I doubt that the question has an answer. This is perhaps the question of whether knowledge is a natural kind. I suppose some philosophers would say that such problems are not of philosophical interest. It would be unfortunate, however, if philosophers were to abandon such a substantial part of epistemology to computer science. This is because the analytic skills that philosophers have acquired are relevant to the problems. 6 Reifying Context We propose the formula holds(p,c) to assert that the proposition p holds in context c. It expresses explicitly how the truth of an assertion depends on context. The relation c1 ≤ c2 asserts that the context c2 is more general 17 Page 18 than the context c1. 1 Formalizing common-sense reasoning needs contexts as objects, in order to match human ability to consider context explicitly. The proposed database of general common-sense knowledge will make assertions in a general context called C0. However, C0 cannot be maximally general, because it will surely involve unstated presuppositions. Indeed we claim that there can be no maximally general context. Every context involves unstated presuppositions, both linguistic and factual. Sometimes the reasoning system will have to transcend C0, and tools will have to be provided to do this. For example, if Boyle’s law of the dependence of the volume of a sample of gas on pressure were built into C0, discovery of its dependence on temperature would have to trigger a process of generalization that might lead to the perfect gas law. The following ideas about how the formalization might proceed are tenta- tive. Moreover, they appeal to recent logical innovations in the formalization of nonmonotonic reasoning. In particular, there will be nonmonotonic “in- heritance rules” that allow default inference from holds(p,c) to holds(p,c ), where c is either more general or less general than c. Almost all previous discussion of context has been in connection with natural language, and the present paper relies heavily on examples from nat- ural language. However, I believe the main AI uses of formalized context will not be in connection with communication but in connection with reasoning about the effects of actions directed to achieving goals. It’s just that natural language examples come to mind more readily. As an example of intended usage, consider holds(at(he,inside(car)),c17). Suppose that this sentence is intended to assert that a particular person is in a particular car on a particular occasion, i.e. the sentence is not just being used as a linguistic example but is meant seriously. A corresponding English sentence is “He’s in the car” where who he is and which car and when is determined by the context in which the sentence is uttered. Suppose, for simplicity, that the sentence is said by one person to another in a situation in which the car is visible to the speaker but not to the hearer and the time at which the the subject is asserted to be in the car is the same time at which the sentence is uttered. 1 1996: In subsequent papers the notation ist(c, p) was used. 18 Page 19 In our formal language c17 has to carry the information about who he is, which car and when. Now suppose that the same fact is to be conveyed as in example 1, but the context is a certain Stanford Computer Science Department 1980s context. Thus familiarity with cars is presupposed, but no particular person, car or occasion is presupposed. The meanings of certain names is presupposed, however. We can call that context (say) c5. This more general context requires a more explicit proposition; thus, we would have holds(at(“Timothy McCarthy”,inside((ιx)(iscar(x) ∧ ∧ belongs(x,“John McCarthy”)))),c5). (3) A yet more general context might not identify a specific John McCarthy, so that even this more explicit sentence would need more information. What would constitute an adequate identification might also be context dependent. Here are some of the properties formalized contexts might have. 1. In the above example, we will have c17 ≤ c5, i.e. c5 is more general than c17. There will be nonmonotonic rules like (∀c1 c2 p)(c1 ≤ c2) ∧ holds(p,c1) ∧ ¬ab1(p,c1,c2) ⊃ holds(p,c2) (4) and (∀c1 c2 p)(c1 ≤ c2) ∧ holds(p,c2) ∧ ¬ab2(p,c1,c2) ⊃ holds(p,c1). (5) Thus there is nonmonotonic inheritance both up and down in the generality hierarchy. 2. There are functions forming new contexts by specialization. We could have something like c19 = specialize(he = Timothy McCarthy,belongs(car,John McCarthy),c5). (6) We will have c19 ≤ c5. 3. Besides holds(p,c), we may have value(term,c), where term is a term. The domain in which term takes values is defined in some outer context. 4. Some presuppositions of a context are linguistic and some are factual. In the above example, it is a linguistic matter who the names refer to. The 19 Page 20 properties of people and cars are factual, e.g. it is presumed that people fit into cars. 5. We may want meanings as abstract objects. Thus we might have meaning(he,c17) = meaning(“Timothy McCarthy”,c5). 6. Contexts are “rich” entities not to be fully described. Thus the “nor- mal English language context” contains factual assumptions and linguistic conventions that a particular English speaker may not know. Moreover, even assumptions and conventions in a context that may be individually accessible cannot be exhaustively listed. A person or machine may know facts about a context without “knowing the context”. 7. Contexts should not be confused with the situations of the situation calculus of (McCarthy and Hayes 1969). Propositions about situations can hold in a context. For example, we may have holds(Holds1(at(I,airport),result(drive-to(airport, result(walk-to(car),S0))),c1). (7) This can be interpreted as asserting that under the assumptions embodied in context c1, a plan of walking to the car and then driving to the airport would get the robot to the airport starting in situation S0. 8. The context language can be made more like natural language and more extensible if we introduce notions of entering and leaving a context. These will be analogous to the notions of making and discharging assump- tions in natural deduction systems, but the notion seems to be more general. Suppose we have holds(p,c). We then write enter c. This enables us to write p instead of holds(p,c). If we subsequently infer q, we can replace it by holds(q,c) and leave the context c. Then holds(q,c) will itself hold in the outer context in which holds(p,c) holds. When a context is entered, there need to be restrictions analogous to those that apply in natural deduction when an assumption is made. One way in which this notion of entering and leaving contexts is more general than natural deduction is that formulas like holds(p,c1) and (say) holds(notp,c2) behave differently from c1 ⊃ p and c2 ⊃ ¬p which are their natural deduction analogs. For example, if c1 is associated with the time 5pm 20 Page 21 and c2 is associated with the time 6pm and p is at(I,office), then holds(p,c1)∧ holds(not p,c2) might be used to infer that I left the office between 5pm and 6pm. (c1 ⊃ p)∧(c2 ⊃ ¬p) cannot be used in this way; in fact it is equivalent to ¬c1 ∨ ¬c2. 9. The expression Holds(p,c) (note the caps) represents the proposition that p holds in c. Since it is a proposition, we can assert holds(Holds(p,c),c ). 10. Propositions will be combined by functional analogs of the Boolean operators as discussed in (McCarthy 1979b). Treating propositions involving quantification is necessary, but it is difficult to determine the right formal- ization. 11. The major goals of research into formalizing context should be to determine the rules that relate contexts to their generalizations and special- izations. Many of these rules will involve nonmonotonic reasoning. 7 Remarks The project of formalizing common-sense knowledge and reasoning raises many new considerations in epistemology and also in extending logic. The role that the following ideas might play is not clear yet. 7.1 Epistemological Adequacy often Requires Approx- imate Partial Theories (McCarthy and Hayes 1969) introduces the notion of epistemological ade- quacy of a formalism. The idea is that the formalism used by an AI system must be adequate to represent the information that a person or program with given opportunities to observe can actually obtain. Often an episte- mologically adequate formalism for some phenomenon cannot take the form of a classical scientific theory. I suspect that some people’s demand for a classical scientific theory of certain phenomena leads them to despair about formalization. Consider a theory of a dynamic phenomenon, i.e. one that changes in time. A classical scientific theory represents the state of the phe- nomenon in some way and describes how it evolves with time, most classically by differential equations. What can be known about common-sense phenomena usually doesn’t permit such complete theories. Only certain states permit prediction of the 21 Page 22 future. The phenomenon arises in science and engineering theories also, but I suspect that philosophy of science sweeps these cases under the rug. Here are some examples. (1) The theory of linear electrical circuits is complete within its model of the phenomena. The theory gives the response of the circuit to any time varying voltage. Of course, the theory may not describe the actual physics, e.g. the current may overheat the resistors. However, the theory of sequential digital circuits is incomplete from the beginning. Consider a circuit built from NAND-gates and D flipflops and timed synchronously by an appropriate clock. The behavior of a D flipflop is defined by the theory when one of its inputs is 0 and the other is 1 when the inputs are appropriately clocked. However, the behavior is not defined by the theory when both inputs are 0 or both are 1. Moreover, one can easily make circuits in such a way that both inputs of some flipflop get 0 at some time. This lack of definition is not an oversight. The actual signals in a dig- ital circuit are not ideal square waves but have finite rise times and often overshoot their nominal values. However, the circuit will behave as though the signals were ideal provided the design rules are obeyed. Making both inputs to a flipflop nominally 0 creates a situation in which no digital theory can describe what happens, because the behavior then depends on the actual time-varying signals and on manufacturing variations in the flipflops. (2) Thermodynamics is also a partial theory. It tells about equilibria and it tells which directions reactions go, but it says nothing about how fast they go. (3) The common-sense database needs a theory of the behavior of clerks in stores. This theory should cover what a clerk will do in response to bringing items to the counter and in response to a certain class of inquiries. How he will respond to other behaviors is not defined by the theory. (4) (McCarthy 1979a) refers to a theory of skiing that might be used by ski instructors. This theory regards the skier as a stick figure with movable joints. It gives the consequences of moving the joints as it interacts with the shape of the ski slope, but it says nothing about what causes the joints to be moved in a particular way. Its partial character corresponds to what experience teaches ski instructors. It often assigns truth values to counterfactual conditional assertions like, “If he had bent his knees more, he wouldn’t have fallen”. 22 Page 23 7.2 Meta-epistemology If we are to program a computer to think about its own methods for gath- ering information about the world, then it needs a language for expressing assertions about the relation between the world, the information gathering methods available to an information seeker and what it can learn. This leads to a subject I like to call meta-epistemology. Besides its potential appli- cations to AI, I believe it has applications to philosophy considered in the traditional sense. Meta-epistemology is proposed as a mathematical theory in analogy to metamathematics. Metamathematics considers the mathematical properties of mathematical theories as objects. In particular model theory as a branch of metamathematics deals with the relation between theories in a language and interpretations of the non-logical symbols of the language. These interpre- tations are considered as mathematical objects, and we are only sometimes interested in a preferred or true interpretation. Meta-epistemology considers the relation between the world, languages for making assertions about the world, notions of what assertions are consid- ered meaningful, what are accepted as rules of evidence and what a knowl- edge seeker can discover about the world. All these entities are considered as mathematical objects. In particular the world is considered as a parameter. Thus meta-epistemology has the following characteristics. 1. It is a purely mathematical theory. Therefore, its controversies, assum- ing there are any, will be mathematical controversies rather than controver- sies about what the real world is like. Indeed metamathematics gave many philosophical issues in the foundations of mathematics a technical content. For example, the theorem that intuitionist arithmetic and Peano arithmetic are equi-consistent removed at least one area of controversy between those whose mathematical intuitions support one view of arithmetic or the other. 2. While many modern philosophies of science assume some relation between what is meaningful and what can be verified or refuted, only spe- cial meta-epistemological systems will have the corresponding mathematical property that all aspects of the world relate to the experience of the knowl- edge seeker. This has several important consequences for the task of programming a knowledge seeker. A knowledge seeker should not have a priori prejudices (principles) about 23 Page 24 what concepts might be meaningful. Whether and how a proposed concept about the world might ever connect with observation may remain in suspense for a very long time while the concept is investigated and related to other concepts. We illustrate this by a literary example. Moliere’s play La Malade Imag- inaire includes a doctor who explains sleeping powders by saying that they contain a “dormitive virtue”. In the play, the doctor is considered a pompous fool for offering a concept that explains nothing. However, suppose the doctor had some intuition that the dormitive virtue might be extracted and concen- trated, say by shaking the powder in a mixture of ether and water. Suppose he thought that he would get the same concentrate from all substances with soporific effect. He would certainly have a fragment of scientific theory sub- ject to later verification. Now suppose less—namely, he only believes that a common component is behind all substances whose consumption makes one sleepy but has no idea that he should try to invent a way of verifying the conjecture. He still has something that, if communicated to someone more scientifically minded, might be useful. In the play, the doctor obviously sins intellectually by claiming a hypothesis as certain. Thus a knowledge seeker must be able to form new concepts that have only extremely tenuous relations with their previous linguistic structure. 7.3 Rich and poor entities Consider my next trip to Japan. Considered as a plan it is a discrete object with limited detail. I do not yet even plan to take a specific flight or to fly on a specific day. Considered as a future event, lots of questions may be asked about it. For example, it may be asked whether the flight will depart on time and what precisely I will eat on the airplane. We propose characterizing the actual trip as a rich entity and the plan as a poor entity. Originally, I thought that rich events referred to the past and poor ones to the future, but this seems to be wrong. It’s only that when one refers to the past one is usually referring to a rich entity, while the future entities one refers to are more often poor. However, there is no intrinsic association of this kind. It seems that planning requires reasoning about the plan (poor entity) and the event of its execution (rich entity) and their relations. (McCarthy and Hayes 1969) defines situations as rich entities. However, the actual programs that have been written to reason in situation calculus might as well regard them as taken from a finite or countable set of discrete 24 Page 25 states. Possible worlds are also examples of rich entities as ordinarily used in philosophy. One never prescribes a possible world but only describes classes of possible worlds. Rich entities are open ended in that we can always introduce more prop- erties of them into our discussion. Poor entities can often be enumerated, e.g. we can often enumerate all the events that we consider reasonably likely in a situation. The passage from considering rich entities in a given discussion to considering poor entities is a step of nonmonotonic reasoning. It seems to me that it is important to get a good formalization of the relations between corresponding rich and poor entities. This can be regarded as formalizing the relation between the world and a formal model of some aspect of the world, e.g. between the world and a scientific theory. 8 Acknowledgements I am indebted to Vladimir Lifschitz and Richmond Thomason for useful suggestions. Some of the prose is taken from (McCarthy 1987), but the examples are given more precisely in the present paper, since Daedalus allows no formulas. The research reported here was partially supported by the Defense Ad- vanced Research Projects Agency, Contract No. N00039-84-C-0211. 9 References Dennett, D.C. (1971): “Intentional Systems”, Journal of Philosophy, vol. 68, No. 4, Feb. 25. Dreyfus, Hubert L. (1972): What Computers Can’t Do: the Limits of Artificial Intelligence, revised edition 1979, New York : Harper & Row. Fikes, R, and Nils Nilsson, (1971): “STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving”, Artificial Intelligence, Volume 2, Numbers 3,4, January, pp. 189-208. Gelfond, M. (1987): “On Stratified Autoepistemic Theories”, AAAI-87 1, 207-211. Ginsberg, M. (ed.) (1987): Readings in Nonmonotonic Reasoning, Mor- gan Kaufmann, 481 pp. 25 Page 26 Green, C., (1969): “Application of Theorem Proving to Problem Solving,” First International Joint Conference on Artificial Intelligence, pp. 219-239. Halpern, J. (ed.) (1986): Reasoning about Knowledge, Morgan Kauf- mann, Los Altos, CA. Hanks, S. and D. McDermott (1986): “Default Reasoning, Nonmono- tonic Logics, and the Frame Problem”, AAAI-86, pp. 328-333. Haugh, Brian A. (1988): “Tractable Theories of Multiple Defeasible In- heritance in Ordinary Nonmonotonic Logics”, Proceedings of the Seventh Na- tional Conference on Artificial Intelligence (AAAI-88), Morgan Kaufmann. Hintikka, Jaakko (1964): Knowledge and Belief; an Introduction to the Logic of the Two Notions, Cornell Univ. Press, 179 pp. Kowalski, Robert (1979): Logic for Problem Solving, North-Holland, Am- sterdam. Kraus, Sarit and Donald Perlis (1988): “Names and Non-Monotonic- ity”, UMIACS-TR-88-84, CS-TR-2140, Computer Science Technical Report Series, University of Maryland, College Park, Maryland 20742. Lifschitz, Vladimir (1987): “Formal theories of action”, The Frame Prob- lem in Artificial Intelligence, Proceedings of the 1987 Workshop, reprinted in (Ginsberg 1987). Lifschitz, Vladimir (1989a): Between Circumscription and Autoepistemic Logic, to appear in the Proceedings of the First International Conference on Principles of Knowledge Representation and Reasoning, Morgan Kaufmann. Lifschitz, Vladimir (1989b): “Circumscriptive Theories: A Logic-based Framework for Knowledge Representation,” this collection. Lifschitz, Vladimir (1989c): “Benchmark Problems for Formal Nonmono- tonic Reasoning”, Non-Monotonic Reasoning, 2nd International Workshop, Grassau, FRG, Springer-Verlag. McCarthy, John (1959): “Programs with Common Sense”, Proceedings of the Teddington Conference on the Mechanization of Thought Processes, Her Majesty’s Stationery Office, London. McCarthy, John and P.J. Hayes (1969): “Some Philosophical Problems from the Standpoint of Artificial Intelligence”, D. Michie (ed.), Machine Intelligence 4, American Elsevier, New York, NY. McCarthy, John (1977): “On The Model Theory of Knowledge” (with M. Sato, S. Igarashi, and T. Hayashi), Proceedings of the Fifth International Joint Conference on Artificial Intelligence, M.I.T., Cambridge, Mass. 26 Page 27 McCarthy, John (1977): “Epistemological Problems of Artificial Intelli- gence”, Proceedings of the Fifth International Joint Conference on Artificial Intelligence, M.I.T., Cambridge, Mass. McCarthy, John (1979a): “Ascribing Mental Qualities to Machines”, Philosophical Perspectives in Artificial Intelligence, Ringle, Martin (ed.), Harvester Press, July 1979. McCarthy, John (1979b): “First Order Theories of Individual Concepts and Propositions”, Michie, Donald (ed.), Machine Intelligence 9, (University of Edinburgh Press, Edinburgh). McCarthy, John (1980): “Circumscription—A Form of Non-Monotonic Reasoning”, Artificial Intelligence, Volume 13, Numbers 1,2, April. McCarthy, John (1983): “Some Expert Systems Need Common Sense”, Computer Culture: The Scientific, Intellectual and Social Impact of the Com- puter, Heinz Pagels (ed.), vol. 426, Annals of the New York Academy of Sciences. McCarthy, John (1986): “Applications of Circumscription to Formalizing Common Sense Knowledge”, Artificial Intelligence, April 1986. McCarthy, John (1987): “Mathematical Logic in Artificial Intelligence”, Daedalus, vol. 117, No. 1, American Academy of Arts and Sciences, Winter 1988. McCarthy, John (1989): “Two Puzzles Involving Knowledge”, Formaliz- ing Common Sense, Ablex 1989. McDermott, D. and J. Doyle, (1980): “Non-Monotonic Logic I”, Arti- ficial Intelligence, Vol. 13, N. 1 Moore, R. (1985): “Semantical Considerations on Nonmonotonic Logic”, Artificial Intelligence 25 (1), pp. 75-94. Newell, Allen (1981): “The Knowledge Level”. AI Magazine, Vol. 2, No. 2. Perlis, D. (1988): “Autocircumscription”, Artificial Intelligence, 36 pp. 223- 236. Reiter, Raymond (1980): “A Logic for Default Reasoning”, Artificial Intelligence, Volume 13, Numbers 1,2, April. Russell, Bertrand (1913): “On the Notion of Cause”, Proceedings of the Aristotelian Society, 13, pp. 1-26. 27 Page 28 Robinson, J. Allen (1965): “A Machine-oriented Logic Based on the Resolution Principle”, JACM, 12(1), pp. 23-41. Sterling, Leon and Ehud Shapiro (1986): The Art of Prolog, MIT Press. Sussman, Gerald J., Terry Winograd, and Eugene Charniak (1971): “Micro-planner Reference Manual”, Report AIM-203A, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge. Vardi, Moshe (1988): Conference on Theoretical Aspects of Reasoning about Knowledge, Morgan Kaufmann, Los Altos, CA. Department of Computer Science Stanford University Stanford, CA 94305 28 This is the html version of the file http://www.media.mit.edu/~lieber/Lieberary/Common-Sense/Beating-Common-Sense/Beating-Common-Sense.pdf. G o o g l e automatically generates html versions of documents as we crawl the web. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:Vk7UuQZluNcJ:www.media.mit.edu/~lieber/Lieberary/Common-Sense/Beating-Common-Sense/Beating-Common-Sense.pdf+%22artificial+intelligence%22+%22common+sense%22+site:edu+pdf&hl=en&client=firefox-a Google is not affiliated with the authors of this page nor responsible for its content. These search terms have been highlighted: artificial intelligence common sense These terms only appear in links pointing to this page: pdf Page 1 Abstract A long-standing dream of artificial intelligence has been to put common sense knowledge into computers—enabling machines to reason about everyday life. Some projects, such as Cyc, have begun to amass large collections of such knowl- edge. However, it is widely assumed that the use of common sense in interactive applications will remain impractical for years, until these collec- tions can be considered sufficiently complete and common sense reasoning sufficiently robust. Recently, at the MIT Media Lab, we have had some success in applying common sense knowl- edge in a number of intelligent Interface Agents, despite the admittedly spotty coverage and unre- liable inference of today's common sense knowl- edge systems. This paper will survey several of these applications and reflect on interface design principles that enable successful use of common sense knowledge. 1 Introduction 1 Things fall down, not up. Weddings have a bride and a groom. If someone yells at you, they're probably angry. One of the reasons that computers seem dumber than humans is that they don't have common sense—a myriad of simple facts about everyday life and the ability to make use of that knowledge easily when appropriate. A long-standing dream of Artificial Intelligence has been to put that kind of knowledge into computers, but applica- tions of common sense knowledge have been slow in coming. Researchers like Minsky [2000] and Lenat [1995], rec- ognizing the importance of common sense knowledge, have proposed that common sense constitutes the bottle- neck for making intelligent machines, and they advocate working directly to amass large collections of such knowledge and heuristics for using it. Considerable progress has been made over the last few years. There are now large knowledge bases of common sense knowledge and better ways of using it then we have had before. We may have gotten too used to putting common sense in that category of "impossible" problems and overlooked opportunities to actually put this kind of knowledge to work. We need to explore new interface designs that don't require complete solutions to the com- mon sense problem, but can make good use of partial knowledge and human-computer collaboration. As the complexity of computer applications grows, it may be that the only way to make applications more helpful and avoid stupid mistakes and annoying interrup- tions is to make use of common sense knowledge.Cell phones should know enough to switch to vibrate mode if you're at the symphony. Calendars should warn you if you try to schedule a meeting at 2 AM or plan to take a vegetarian to a steak house. Cameras should realize that if you took a group of pictures within a span of two hours, at around the same location, they are probably of the same event. Initial experimentation with using common sense en- countered significant obstacles. First, despite the vast amount of effort put into common sense knowledge bases, coverage is still sparse relative to the amount of knowledge humans typically bring to bear. Second, infer- ence with such knowledge is still unreliable, due to vagueness, exceptional cases, logical paradoxes, and other problems. 2 Question-Answering versus Interface Agent Applications Many early attempts at applying common sense fell into the category of question-answering, story understanding, or information retrieval kind of problems. The hope was that use of common sense inference would improve re- sults beyond what was possible with simple keyword matching or statistical methods. For example, in a retrieval demo of Cyc [Lenat, 1995], one could ask "Show me a picture of someone who is disappointed", and receive a picture of the second fin- isher in the Boston Marathon, by a chain of reasoning like: A marathon is a contest; The goal of a contest is to be first; If you do not achieve your goals, then you will be disappointed. When it works, this is great. But direct Beating Common Sense into Interactive Applications Henry Lieberman, Hugo Liu, Push Singh, Barbara Barry MIT Media Lab, 20 Ames St., Cambridge, MA 02139 USA Page 2 question-answering places very exacting demands on a system. First, the user is expecting a direct answer. If the an- swer is good the user will be happy, if the answer is not, the user will be critical of the system. If the accuracy falls below a certain threshold in the long term, the user will give up using the system completely. Second, the system only gets one shot at finding the correct answer, and it must do so quickly enough to maintain the feeling of interactivity (no more than a few seconds). Over the last few years, we have been exploring the domain of Intelligent Interface Agents [Maes, 1994]. An interface agent is an AI program that attaches itself to a conventional interactive application (text or graphical editor, Web browser, spreadsheet, etc.) and both watches the user's interactions, and is capable of operating the interface as would the user. The jobs of the agent are to provide help, assistance, suggestions, automation of common tasks, adaptation and personalization of the in- terface. Our experience has been that Interface Agents can use common sense knowledge much more effectively than direct question-answering applications, because they place fewer demands on the system. Since all the capa- bilities of the interactive application remain available for the user to use in a conventional manner, it is no big deal if common sense knowledge does not cover a particular situation. If a common sense inference turns out wrong, the user is often no worse off then they would be without any assistance. The user is not expecting a direct answer to every ac- tion, only that the agent will come up with something helpful every once in a while. Since the agent operates in a continuous, long-term manner, if it cannot respond im- mediately, it can gather further evidence and perhaps deliver a meaningful interaction in the future. If the agent's knowledge is not sufficient, it can ask the user to fill in the gaps. In short, the use of common sense in Interface Agents can be made fail-soft. Interface agents are often proac- tive, “pushing” information rather than “pulling” it as query-response systems do, and it is easier to make the former kind of agents fail-soft. 3 Applications of Common Sense in In- terface Agents The remainder of this paper will survey several of our lab’s recent projects in this area, to illustrate the princi- ples above. Except where noted, these applications were built using knowledge drawn from Open Mind Common Sense (OMCS, see sidebar), a common sense knowledge base of over 675,000 natural language assertions built from the contributions of over 13,000 people over the World Wide Web [Singh et al., 2002]. Many of these applications made use of early versions of OMCSNet, a semantic network of 280,000 relations extracted from the OMCS corpus with 20 link types covering taxonomic, meronomic, temporal, spatial, causal, functional, and other kinds of relations. 3.1 Common Se nse in an Ag ent fo r Digi- tal Photo graphy Figure 1. Telling stories with ARIA In ARIA (Annotation and Retrieval Integration Agent, Figure 1) [Lieberman et al., 2001], we attempt to lever- age common sense knowledge to semi-automatically an- notate photos and proactively suggest relevant photos [Lieberman & Liu, 2002a]. ARIA observes a user as s/he types a story, parses the text in real time, and continu- ously displays a relevance-ordered list of photos. When the user inserts photos in text, the system automatically annotates the photos with relevant keywords. Common sense knowledge is used to inform semantic recognition agents, which recognize people, places, and events in the text. These recognition agents extract ap- propriate annotations to be added to photos inserted in the text. In retrieval, common sense knowledge is com- piled into a semantic network, and associative reasoning helps to bridge semantic gaps (e.g. connect text about “wedding” to a photo annotated with “bride”) [Liu & Lieberman, 2002b]. The system also learns from per- sonal assertions from the text (e.g. “My sister’s name is Mary.”), presumably unique to the author’s context, which can be treated as a source of implicit knowledge in much the same manner as the common sense assertions coming from Open Mind. The application of common sense in ARIA has several fail-soft aspects. Annotations suggested by the agent carry less weight than a user’s annotations in retrieval, and can be rejected or revised by the user. Similarly, in retrieval, common sense is used only to bridge semantic gaps, and would never supersede explicit keyword matching. If a user finds a suggestion useful, s/he can choose to drag that photo in the text. But if the sugges- tion is inappropriate, the user’s writing task is not dis- rupted. Page 3 3.2 Common Se nse in Affec tive Classifi- cation of Text Consider the text, “My wife left me; she took the kids and the dog.” There are no obvious mood keywords such as “cry” or “depressed”, or any other obvious cues, but the implications of the event described here are decidedly sad. This presents an opportunity for common sense knowledge, a subset of which concerns the affective qualities of things, actions, events, and situations. From the Open Mind Common Sense knowledge base, a small society of linguistic models of affect was mined out, us- ing a set of mood keywords as a starting point. The im- port of common sense knowledge to this application is to make affective classification of text more comprehensive and reliable by considering underlying semantics, in ad- dition to surface features. Figure 2. Empathy Buddy reacts to an email. Using this commonsense-informed approach, two ap- plications were built. One is an email editor, Empathy Buddy, above, which uses Chernoff-style faces to inter- actively react to a user as s/he composes an email using one of six basic Ekman emotions [Liu, Lieberman, Selker 2003]. A user study showed that users rated the affective Software Agent as being more interactive and intelligent than a randomized-face control. Another application uses a hyperlinked color bar to help users visualize and navigate the affective structure of a text document [Liu, Lieberman, Selker, 2002]. Us- ing the tool, users were able to improve the speed of within-document information access tasks. The affective model approach has been recently ex- tended to modeling point-of-view and personality, ana- lyzing an author's writings and making a comparison of what several authors "might have thought" about a speci- fied topic [Liu and Maes, 2004]. 3.3 Common Se nse in Video Capture and Edi ting The Cinematic Common Sense project [Barry & Daven- port, 2003] is being developed to provide feedback to documentary videographers during production. Common sense knowledge relevant to the documentary subject domain is retrieved to assist the videographer when they are in the field recording video footage about a docu- mentary subject. After each shot is recorded, metadata is created by the videographer in natural language and sub- mitted as a query to a subset of the Open Mind database. For example, the shot metadata "a street artist is painting a painting" would yield a shot suggestions such as "the last thing you do when you paint a painting is clean the brushes" or "something that might happen when you paint a picture is paint gets on your hands” ." These as- sertions can be used by the filmmaker as a flexible shot list that is dynamically updated in accordance with the events the filmmaker is experiencing. Annotation of content is enriched, as in ARIA, to support later search of image-based content. Collections of shots can be also ordered into rough temporal and causal sequences based on the associated common sense annotations. Figure 3. Common Sense helps associate story elements with video clips. 3.4 Common Se nse in Other Story telling Applicati ons A common thread throughout the above applications is that they all assist the user in some sort of storytelling process. Storytelling is a great area for common sense because it draws on a wide spectrum of understanding of situations of everyday life. It can provide an intermediate level for the agent to understand and assist the user that is better than simple keywords but stops short of full natu- ral language understanding. David Gottlieb and Josh Juster’s OMAdventure [Vari- ous Authors, 2003] (Figure 3) dynamically generates a Dungeons-and-Dragons type virtual environment by us- ing common sense knowledge. If the current game loca- Page 4 tion is a kitchen, the system poses the questions to Open Mind, “What do you find in a kitchen?” and “What loca- tions are associated with a kitchen?” If “You find an oven in a kitchen”, we ask “What can you do with an oven?” Objects such as the oven or operations such as cooking are then made available as moves in the game for the player to make, and the associated locations are the exits from the current situation. If the player is given the opportunity to create new objects are locations in the game that can be a way of extending the knowledge. If the player adds a blender to a kitchen, now we know that blenders are something that can be found in a kitchen. Figure 3. OMAdventure dynamically generates generates an adventure game’s universe by using common sense knowledge. Alexandro Artola’s StoryIllustrator [Various Authors, 2003] (Figure 4) is like Aria in that it gives the user a story editor and photo database and tries to continuously retrieve photos relevant to the user’s typing. However, instead of using an annotated personal photo collection, it employs Yahoo’s image search to retrieve images from the Web. Common sense knowledge is used for query expansion, so that a picture of a baby is associated with the mention of milk. Chian Chuu and Hana Kim’s StoryFighter [Various Authors, 2003] plays a game where the system and the user take turns contributing lines to a story. The game proposes a start state, e.g. “John is sleepy” and an end state, “John is in prison”, and the goal is to get from the start state to the end state in a specified number of sen- tences. Along the way there are “taboo” words that can’t be mentioned (“You can’t use the word ‘arrest’”) as an additional constraint to make the game more challenging. Common sense is used to deduce the consequences of an event. (“If you commit a crime, you might go to jail”) and to propose taboo words to exclude the most obvious continuations of the story. 3.6 Common Se nse fo r Topi c Spotting i n Conversation Nathan Eagle, Push Singh and Sandy Pentland [Eagle, Singh, Pentland, 2003] are exploring the idea of a wear- able computer with continuous audio (and perhaps ulti- mately, video) recording. They are interested not only in audio transcription, but in situational understanding -- understanding general properties of the physical and social environment in which the computer finds itself, even if the user is not directly interacting with the ma- chine. Speech recognition is used to roughly transcribe the audio, but with current technology, speech transcription accuracy, especially for conversation, is poor. However, understanding general aspects of the situation such as whether the user is at home or at work, alone or with people, with friends or strangers, etc., is indeed possible. Such recognition is vastly improved by using common sense knowledge to map from topic-spotting words out- put by the speech recognizer, ("lunch", "fries", "styro- foam") to knowledge about everyday activities that the user might be engaged in (eating in a fast-food restau- rant). Bayesian inference is used to rank hypotheses generated by OMCS Net. Austin Wang and Justine Cassell used common sense in a virtual collaborative storytelling partner for children, [Wang and Cassell, 2003], whose goal is to improve lit- eracy and storytelling skills. An on-screen character, SAM, starts telling a story and invites the child to con- tinue the story at certain points. For example, "Jack and Jane were playing hide and seek. Jane hid in… now it's your turn". The system uses speech recognition to listen to the child's story, but the recognition is not good enough to be sure of understanding everything the child had to say. Instead, the results of the recognition are used for rough topic-spotting, in the manner of Eagle's system. In the hide and seek example, the system could hear the word "bedroom". Then common sense knowledge is used to determine what is likely to be in a bedroom, e.g. bed, closet, dresser, etc. The result is used to concoct a plausible continuation of the story, when it is the virtual character's turn again to talk, e.g. "Jane's parents walked into the bedroom while she was hiding under the bed". 3.7 Common Se nse fo r a Dy namic Touris t Phr aseboo k Globuddy [Musa et al., 2003], by Rami Musa, Andrea Kulas, Yoan Anguilete, and Madleina Scheidegger uses common sense to aid tourists with translation. Phrase- books like Berlitz will commonly provide a set of words and phrases useful in a common situation, such as a res- taurant or hotel. But they can only cover a few such situations. With Globuddy, you can type in your (perhaps unusual) situation (“I’ve just been arrested”) and it re- Page 5 trieves common sense surrounding that situation and feeds it to a translation service. “If you are arrested, you should call a lawyer.” “Bail is a payment that allows an accused person to get out of jail until a trial”. A recent implementation by Alex Faaborg and José Espinosa puts Globuddy on handheld and cell phone platforms. F ig u re 4 . T h e G lo b u d d y 2 d y n a m ic p h raseb o o k g iv es y o u tran s latio n s o f p h rases co n ce p tu all y rela ted to a see d w o rd o r p h rase 3.7 Common Se nse fo r Word Compl etio n Applications like Globuddy play up the role of common sense knowledge bases in determining what kinds of topics are "usual" or "ordinary". A simple, but powerful application of this is in predictive typing or word or phrase completion. Predictive typing can vastly speed up interfaces, especially in cases where the user has dif- ficulty typing normally, or on small devices such as cell phones whose keyboards are small. Conventional ap- proaches to predictive typing select a prediction either from a list of words the user recently typed, or from an ordered list of the most commonly occurring words in English. Alex Faaborg and Tom Stocky [Stocky, Faaborg, Lieberman, 2004] have implemented a Common Sense predictive text entry facility for a cell phone plat- form. It uses Open Mind Common Sense Net to find the next word that "makes sense" in the current context. For example, typing "train st" leads to the completion "train station" even though the user may not have typed that phrase before, nor is "station" the most common "st" word. Figure 5. Common Sense can lead to good sugges- tions for word completion Performance of Common Sense alone in this task is com- parable or slightly better than conventional statistical methods and may be much better when combined with conventional methods, especially where the conventional methods don't make strong predictions in particular cases. Similar approaches have great potential for use in other kinds of predictive and corrective interfaces. 3.8 Co mmon Sense i n a Di sk Joc key's As- sis tant Joan Morris-DiMicco, Carla Gomez, Arnan Sipitakiat, and Luke Ouko implemented a Common Sense Disk Jockey [Various Authors], an assistant for music selec- tion in dance clubs. DJs often select music initially based on a few superficial parameters (age, ethnicity, dress) of the audience, and then adjust their subsequent choices based on the reaction of the audience. CSDJ uses Erik Mueller’s ThoughtTreasure as a rea- soning engine [Mueller, 1998] to filter a list of MP3 files according to common sense assumptions about what kind of music particular groups might like. It also incorporates an interface to a camera that measures activity levels of the dance floor to give feedback to the system as to whether the selection of a particular piece of music in- creased or decreased activity. 3.9 Common Se nse fo r Mapping Us er Goa ls to Concre te Actions We also have worked on some projects incorporating common sense knowledge into conventional search en- gines. These applications still maintain the “one-shot” query-response interaction that we criticized in the be- ginning as being less suited to common sense applica- tions than continuously operating interface agents. How- ever, we apply the common sense in a fundamentally different way than conventional attempts to add inference to search engines. The role of common sense is to map from the user’s search goals, which are sometimes not explicitly stated, to keywords appropriate for a conven- tional search engine. We believe that this process will make it more likely that the user would receive good re- sults in the case where conventional keywords wouldn’t work well, thereby making the interface more fail-soft. Two systems, Reformulator [Singh, 2002] and GOOSE [Liu, Lieberman & Selker, 2002] are common sense ad- juncts to Google. Reformulator, like Cyc, does inference on the subject matter of the search itself. Our work in improving search Page 6 engine interfaces [Liu, Lieberman & Selker, 2002; Singh 2002], is motivated by the observation that forming good search queries can often be a tricky proposition. We studied expert users composing queries [Liu, Liberman & Selker, 2002], and concluded that they usually already know something about the structure and contents of pages they are expecting to find. After a little bit of search common sense is used to decide on the nature of the expected results, the chain of reasoning leading from the high level search intent to query formation is usually very straight-forward and commonsensical. By contrast, novice users lack the experience in chain reasoning from a high-level search intent to query for- mation, so they often state their search goal directly. For example, a novice may often type "my cat is sick" into a search engine rather than looking for "veterinarians, Boston, MA" even though the chain of reasoning is very straight-forward. In this situation, there is an opportunity for a search engine Interface Agent to observe a novice user's queries. The Agent attempts to infer the user's intent and when it is detected that a query may not return the best results, the Agent can help to reformulate the query using search expertise and inferencing over commonsense knowledge, and opportunistically suggest "Did you mean to look for veterinarians in Boston, MA?" above the displayed re- sults. In GOOSE, we were able to improve a significant number of queries made by novice users. However, in that system, we still needed users to help the system by manually disambiguating the type of search goal. Our current work on automated disambiguation will allow us to develop an Interface Agent which does not interfere with the user's task at all, and only suggests a better query (appearing above the search results) if it is able to offer a better one. This allows the Interface Agent to make use of common sense to improve the user experi- ence in a fail soft way. If common sense is too spotty to reformulate a query, no suggestion is offered. Figure 6. The GOOSE common sense search engine Another application that also maps between users' goals and concrete actions is currently under develop- ment by Alex Faaborg, Sakda Chaiworawitkul and Henry Lieberman for the composition of Web services. In Tim Berners-Lee's proposed vision of the next- generation Semantic Web [Berners-Lee, Hendler, Lassila, 2004], users can state high-level goals, and agent pro- grams can scout out Web services that can satisfy those goals, possibly composing multiple services, each of which accomplishes a subgoal, without explicit direction from the user. For example, a request "Schedule a doc- tor's appointment for my mother within ten miles of her house" might involve looking up directories of doctors with a certain specialty; checking a reputation server; consulting a geographic server to check addresses, routes, or transit; synchronizing the mother's and doctor's sched- ules; etc. We fully concur with this vision. However, to date, most of the work on the Semantic Web has focused on the formalisms such as XML, OWL, SOAP and UDDI that will be used to represent metadata stored on the Web pages that will presumably be accessed by these agents. Little work is concerned with how an agent might actu- ally put together Semantic Web services to accomplish high-level goals for the user. Looking at currently available and proposed Web service descriptions, we see that even if everyone agrees on the representation formalism, different services might ask for and return different kinds of information for the same services, and connecting them is still a task that now re- quires a human programmer to anticipate the form and structure of such services. For example, a weather service might deliver a weather report given a Zip code. But if the user asked "What's the weather in Denver?", then something has to know how Zip codes are associated with cities. This is a job for common sense. Common sense is used to compose Web services in a manner similar to the way it is used in GOOSE. User goals are obtained through two different interfaces; one that allows natural language statement of goals, and an- other that provides a sidebar to a browser that proposes relevant services interactively as the user is browsing. OMCSNet is used to expand the user goal so that it can potentially match semantically related concepts which may appear in the Web service descriptions. Thus we can achieve a much broader and more appropriate map- ping of Web services than is possible with literal search through Web service descriptions alone. 3.1 1 Interface s for Impro ving Common Sense Kno wledge Bases One criticism of Open Mind and similar efforts is that knowledge expressed in single sentences is often implic- itly dependent on an unstated context. For example, the sentence “At a wedding, the bride and groom exchange rings” might assume the context of a Christian or Jewish wedding, and might not be true in other cultures. Re- becca Bloom and Avni Shah [Various Authors, 2003] implemented a system for contextualizing Open Mind knowledge by prompting the user to add explicit context elements to each assertion. Retrieval can then supply in- formation about what context an assertion depends on or find analogous assertions in other contexts. For example, in a Hindu wedding, the bride and groom exchange Page 7 necklaces that serve the same ritual function as rings do in the West. Several projects involved interfaces for knowledge elicitation or feedback about the knowledge base itself. The Open Mind web site itself contains several of what it calls “activities” that encourage users to fill in templates that call for a particular type of knowledge. Knowledge about the function of objects is elicited with a template “You __ with a __”. Tim Chklovski [Chklovski & Mihal- cea, 2002] developed an interface for prompting the user to disambiguate word senses in Open Mind and for auto- matically performing simple analogies and asking the user to confirm or deny them. Andrea Lockerd’s ThoughtStreams [Various Authors, 2003] aims to acquire common sense knowledge through simulation. Everyday life is modeled in a game world, similar to the game, The Sims. An agent tracks user be- havior in the world and tries to discover behavioral regularities with a similarity-based learning algorithm. It is also envisioned that a game character “bot” would be introduced that would occasionally ask human characters why they do things, in a manner of an inquisitive (but hopefully not too annoying) child. 4 Roles for Common Sense in Applica- tions Each of these applications uses commonsense differently. None of them actually does ‘general purpose’ common- sense reasoning—while each makes use of a broad range of commonsense knowledge, each makes use of it in a particular way by performing only certain types of infer- ences. Retrieving event-subevent structure. It is some- times useful to collect together all the knowledge that is relevant to some particular class of activity or event. For example the Cinematic Common Sense project makes use of common sense knowledge about event-subevent structure to make suitable shot suggestions at common events like birthdays and marathons. For the topic ‘get- ting ready for a marathon’, the subevents gathered might include: putting on your running shoes, picking up your number, and getting in your place at the starting line. Goal recognition and planning. The Reformulator and GOOSE search engines exploit common sense knowledge about typical human goals to infer the real goal of the user from their search query. These search engines can make use of knowledge about actions and their effects to engage in a simple form of planning. Af- ter inferring the user’s true intention, they look for a way to achieve it. Temporal projection. The MakeBelieve storytelling system [Liu & Singh, 2002] makes use of the knowledge of temporal and causal relationships between events in order to guess what is likely to happen next. Using this knowledge it can generate stories like: David fell off his bike. David scraped his knee. David cried like a baby. David was laughed at. David decided to get revenge. David hurt people. Particular consequences of broad classes of ac- tions. Empathy Buddy senses the affect in passages of text by prediction only those consequences of actions and events that have some emotional significance. This can be done by chaining backwards from knowledge about desirable and undesirable states. For example, if being out of work is undesirable, and being fired causes to be to be out of work, then the passing ‘I was fired from work today’ can be sensed as undesirable. Specific facts about particular things. Specific facts like “Golden Gate Bridge is located in San Fran- cisco”, or “a PowerBook is a kind of laptop computer” are often useful. Aria can reason that an e-mail that mentions that “I saw the Golden Gate Bridge” meant that “I was in San Francisco at the time”, and proactively re- trieves photos taken in San Francisco for the user to in- sert into the e-mail. Conceptual relationships. A commonsense knowl- edgebase can be used to supply ‘conceptually related’ concepts. The Globuddy program retrieves knowledge about the events, actions, objects, and other concepts related to a given situation in order to make a custom phrasebook of concepts you might wish to have transla- tions for in a given situation. 4.1 Do Tr y This at Ho me We invite the AI community to make use of the Open Mind Common Sense knowledge base and associated tools to prototype applications as we have. We hope these application descriptions will inspire others to continue along these lines. Please see http://openmind.media.mit.edu/. We also welcome feedback from those who do choose to try this and would appreciate hearing of similar applica- tions projects. 5 Conclusions We think that system implementers often fail to realize how underconstrained many user interface situations are. In many cases, systems either do nothing or perform ac- tions that are essentially arbitrary. These applications show that there exists the potential to use common sense knowledge to do something that at least might make sense as far as the user is concerned. A little bit of knowledge is often better than nothing. Many applications, such as storytelling, or language translation for tourists, can cover a broad range of sub- jects. With such applications, it is better to know a little bit about a lot of things than a lot about just a few things. Many past efforts have been stymied by insisting that coverage of the knowledge base be complete. They are often afraid to perform inferences because of the possi- bility of error. We rely on the interactive nature of the Page 8 interface to provide feedback to the user and the opportu- nity for correction and completion. Explicit input from the user is very expensive in the interface, so common sense knowledge can act as an am- plifier of that input, bringing in related facts and concepts that broaden the scope of the application. Although our descriptions of each of these projects have been necessarily brief, we hope that the reader will be impressed by the breadth and variety of the applica- tions of common sense knowledge. We don’t have to wait for complete coverage or completely reliable inference to put this knowledge to work, although as these improve, the applications will only get better. We think that the AI community ought to be paying more attention to this ex- citing area. After all, it’s only common sense. Sidebar: Open Mind Common Sense We built the the Open Mind Common Sense (OMCS) web site [http://openmind.media.mit.edu/] to make it easy and fun for members of the general public to work to- gether to construct a commonsense database. OMCS was launched in September 2000, and as of January 2004 it has accumulated a corpus of about 675,000 pieces of commonsense knowledge from over 13,000 people across the web, many with no special training in computer sci- ence or artificial intelligence. The contributed knowledge is expressed in natural language, and consists largely of the kinds of simple assertions shown in Table 1. Table 1. Sample of OMCS corpus People live in houses. Running is faster than walking. A person wants to eat when hungry. Things often found together: light bulb, contact, glass. Coffee helps wake you up. A bird flies. The effect of going for a swim is getting wet. The first thing you do when you wake up is open your eyes. Rain falls from the sky. Apples are not blue. A voice is the sound of a person talking. Rather than formulating a precise ontology in advance and then having knowledge enterers contribute knowl- edge expressed in terms of that ontology, we instead en- couraged our users to provide information clearly in English via free-form and structured templates. Indeed, we sometimes think of OMCS not so much as a ‘knowl- edge base’ per se, but as a corpus of commonsense statements from which a more organized knowledge base can be constructed using information extraction tech- niques. In particular, we have extracted a large-scale se- mantic network called OMCSNet [Liu and Singh, 2004] consisting of 25 types of binary relations such as is-a, has-function, has-subevent, and located-in. The most re- cent version of OMCSNet contains 280,000 links relating 80,000 concepts, where the concepts are simple English phrases like ‘go to restaurant’ or ‘shampoo bottle’. We were surprised by the high quality of the contribu- tions, given that the OMCS site had no special mecha- nisms for knowledge validation or correction. A manual evaluation of the corpus revealed that about 90% of the corpus sentences were rated 3 or higher (on a 5 point scale) along the dimensions of truth and objectivity, and about 85% of the corpus sentences were rated as things anyone with a high school education or more would be expected to know. Thus the data, while noisy, was not entirely overwhelmed by noise, as we had originally feared it might, and also it consisted largely of knowl- edge one might consider shared in our culture. Several The Open Mind Word Expert site [http://www.teach-computers.org/] lets users tag the senses of the words in individual sentences drawn from both the OMCS corpus and the glosses of WordNet word senses. The Open Mind 1001 Questions site [http://www.teach-computers.org/] uses analogical rea- soning to pose questions to the user by analogy to what it already knows, and hence makes the user experience more interactive and engaging. The Open Mind Experi- ences site [http://omex.media.mit.edu/] lets users teach stories in addition to facts by presenting them with story templates based on Wendy Lenhert's plot-units. Finally, the latest Open Mind LifeNet site lets users directly build probabilistic graphical models, and uses those models to immediately make inferences based on the knowledge that has been contributed so far. References [Barry & Davenport, 2003]. Barry B., and Davenport G. (2003). Documenting Life: Videography and Common Sense . In Proceedings of IEEE International Conference on Multimedia. New York: IEEE, 2003. [Berners-Lee, Hendler, Lassila, 2004] Berners-Lee, T. Hendler, J., Lassila, O. The Semantic Web. Scientific American, May 2001. [Chklovski & Mihalcea, 2002] Chklovski, T. and R. Mi- halcea, (2002). Building a Sense Tagged Corpus with Open Mind Word Expert. In Proceedings of the Work- shop on "Word Sense Disambiguation: Recent Successes and Future Directions", ACL 2002. Page 9 [Eagle, et. al, 2003] Eagle, N., P. Singh, A. Pentland, Common Sense Conversations: Understanding Casual Conversation using a Common Sense Database, Artificial Intelligence, Information Access, and Mobile Computing Workshop at the 18th International Joint Conference on Artificial Intelligence (IJCAI) Acapulco, Mexico. August 2003. [Lenat, 1995] Lenat, D. B. (1995). CYC: A large-scale investment in knowledge infrastructure. Communications of the ACM, 38(11): 33-38. [Lieberman & Liu, 2002a] Lieberman, H. and H. Liu, (2002). Adaptive Linking between Text and Photos Using Common Sense Reasoning. In Proceedings of the 2nd International Conference on Adaptive Hypermedia and Adaptive Web Based Systems, (AH2002) Malaga, Spain. [Liu & Lieberman, 2002b] Liu, H. and H. Lieberman, (2002). Robust photo retrieval using world semantics. Proceedings of the 3rd International Conference on Lan- guage Resources And Evaluation Workshop: Using Se- mantics for Information Retrieval and Filtering (LREC2002), Las Palmas, Canary Islands. [Liu, Liberman & Selker, 2002] Liu, H., Lieberman, H., Selker, T. (2002). GOOSE: A Goal-Oriented Search En- gine With Commonsense. Proceedings of the 2nd Inter- national Conference on Adaptive Hypermedia and Adap- tive Web Based Systems, (AH2002) Malaga, Spain. [Liu and Maes, 2004] Liu, H. and P. Maes, (2004)., What Would They Think? A Computational Model of Atti- tudes. International Conference on Intelligent User Inter- faces (IUI '04), January 2004, Funchal, Portugal. [Lieberman et al., 2001] Lieberman, H., E. Rosenzweig. P. Singh, (2001). Aria: An Agent For Annotating And Retrieving Images, IEEE Computer, July 2001, pp. 57- 61. [Liu et al., 2003] Liu, H., H. Lieberman, , T. Selker, (2003). A Model of Textual Affect Sensing using Real- World Knowledge. In Proceedings of IUI 2003. Miami, Florida. [Liu, Lieberman & Selker, 2002] Liu, H., H. Lieberman, and T. Selker, (2002) Automatic Affective Feedback in an Email Browser. MIT Media Lab Software Agents Group Technical Report SA02-01. November, 2002. [Liu & Singh, 2002] Liu, H., P. Singh, (2002). MAKEBELIEVE: Using Commonsense to Generate Sto- ries. In Proceedings of the 20th National Conference on Artificial Intelligence, (AAAI-02), 957-958, Edmonton, Canada [Liu and Singh, 2004] Liu, H. and P. Singh (2004). The Open Mind Common Sense Net Toolkit. Draft at http://web.media.mit.edu/~hugo/publications/drafts/OMC SNet%20(CIKM).5.doc [Maes, 1994] Maes, P. (1994). Agents that Reduce Work and Information Overload. Communications of the ACM, 37(7). [Minsky, 2000] Minsky, Marvin (2000). Commonsense- based interfaces. Communications of the ACM, 43(8), 67- 73. [Mueller, 1998] Mueller, Erik T. (1998). Natural lan- guage processing with ThoughtTreasure. New York: Sig- niform. Available at: http://www.signiform.com/tt/book/ [Musa, et al., 2003] Musa, R., A. Kulas, Y. Anguilette, M. Scheidegger. (2003) Globuddy, A Broad-Context Dynamic Phrasebook, International Conference on Mod- eling and Using Context (CONTEXT '03), Stanford, CA., August 2003. Lecture Notes in Computer Science, Springer-Verlag, Heidelberg, 2003. [Singh, 2002] Singh, P. (2002). The public acquisition of commonsense knowledge. In Proceedings of AAAI Spring Symposium: Acquiring (and Using) Linguistic (and World) Knowledge for Information Access. Palo Alto, CA, AAAI [Stocky, Faaborg, Lieberman, 2004] Common Sense for Predictive Text Entry, submitted to CHI 2004. Vienna, April 2004. [Various Authors, 2003] Various Authors (2003). Com- mon Sense Reasoning for Interactive Applications Pro- jects Page. http://www.media.mit.edu/~lieber/Teaching/Common- Sense-Course/Projects/Projects-Intro.html. [Wang and Cassell, 2003] Wang, A. and J. Cassell, (2003). Co-authoring, Collaborating, Criticizing: Col- laborative Storytelling between Real and Virtual Chil- dren, Vienna Workshop '03: Educational Agents - More than Virtual Tutors, Austrian Research Institute for Arti- ficial Intelligence, Vienna, Austria, June 2003. This is the html version of the file http://www.cs.rochester.edu/u/brown/242/assts/termprojs/phil.pdf. G o o g l e automatically generates html versions of documents as we crawl the web. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:gnckszmyN-sJ:www.cs.rochester.edu/u/brown/242/assts/termprojs/phil.pdf+%22artificial+intelligence%22+%22common+sense%22+site:edu+pdf&hl=en&client=firefox-a Google is not affiliated with the authors of this page nor responsible for its content. These search terms have been highlighted: artificial intelligence common sense These terms only appear in links pointing to this page: pdf Page 1 Practical applications of Philosophy in Artificial Intelligence Karim Oussayef Among the sciences, Artificial Intelligence holds a special attraction for philosophers. A.I. involves using computers to solve problems that seem to require human reasoning. This includes computer programs that can beat human opponents at games, automatically find and proof theorems and understand natural language. Some people in the AI field contend that programs that solve these types of problems have the possibility of not only thinking like humans, but also understanding concepts and becoming conscious. This viewpoint is called strong AI 1 . Many philosophers are concerned with this bold statement and there is no shortage of arguments against the metaphysical possibility of strong AI. If these philosophical arguments against strong AI are true then there are limits to machine intelligence that cannot be surpassed by better algorithms, faster computers or more clever ideas. Hilary Putnam in his paper Much Ado About Not Very Much asks “AI may someday teach us something about how we think, but why are we so exercised about it now? Perhaps it is the prospect that exercises us, but why do we think now is the time to think decide what might in principle be possible?” The reason we are so exercised about A.I. is because knowing whether true intelligence is a possibility will change the goals of researchers in the field. If strong AI is not possible then the best we can hope for is a program that acts humanly but doesn’t think humanly. Even this goal is a very difficult and many programs seek to achieve it. Cycorp 2 is a company whose software attempts to 1 Coined by John Searl in Minds, Brains and Programs. 2 Information from Cycorp’s website. Page 2 mimic human intelligence by creating a huge database of common sense facts. Their website gives some examples: “Cyc knows that trees are usually outdoors, that once people die they stop buying things, and that glasses of liquid should be carried right side up.” To illustrate how a fact-based program such as Cycorp’s would try to solve a simple problem let us turn to the Turing test 3 . Turing reasoned that a computer could prove that it was artificially intelligent by fooling a person into thinking it was another human being. His test was modeled from this reasoning: A human would type questions to either another human or a computer (he or she wouldn’t know which) for a certain amount of time. If that person couldn’t tell at the end of the time which of the two he or she was talking to, the computer would pass the test (and therefore Turing reasoned, be artificially intelligent). Let me stress that I am not arguing that the Turing test is a good one for determining if a computer can think; I am simply using it to demonstrate how a program might go about solving a problem. The fact-based program mentioned above might try to answer the simple question “What is a car?” by supplying the information that was in its code: “A car is a small vehicle with 4 wheels”. A harder question might have to do with a description a car object followed by “What am I describing?” This could be answering by going down a tree of facts as follows: The description is of a vehicle, search for all the objects under the vehicle topic. It has four wheels; discard the possibility of the motorcycle. It is light; discard the possibility of the truck. Conclusion: It must be a car. A program like this could pass the Turing test if it was given enough data. However it has many disadvantages. First it requires someone to input a vast amount of 3 Introduced by Alan Turing’s article Computing Machinery and Intelligence in 1950. Page 3 information manually. Although the program is capable of making some extensions of the given information, it still needs millions of hard facts. Cycorp’s database has been painstakingly entered using over 600 person-hours of effort since 1984. The list of facts now stands at 3 million (Anthes). Second the machine doesn’t seem to work like a human, it looks up rules and then gives an answer instead of figuring out what the question means. Searle’s Chinese room analogy shows why this program isn’t an example of strong AI. Imagine an English speaking person inside of a small room. This person has access to a large rulebook, which is written in English. Other people outside the room can pass notes written in Chinese to him through a small hole in the wall. Although the person inside the small room cannot speak Chinese, he uses the complex rulebook to give back an appropriate response to the Chinese writing in Chinese. Also imagine that this rulebook is so well written that the answers the person inside the room gives back are indistinguishable from the answers that a native Chinese speaker might give back. This “man in a room” system would be able to carry on a written conversation with a native Chinese speaker on the other side of the wall. In fact the Chinese person might assume he was speaking to another person who understands Chinese. We can plainly see however, that the person does not. This analogy is disastrous for fact-based AI. In the same way that the computer passes the Turing test by fooling humans into thinking it is another human, the English speaker can fool native Chinese speakers into thinking that he understands Chinese. To further explain, the person inside the room is analogous to the computer CPU; they both know how to interpret instructions. The rulebook is analogous to the program; they Page 4 supply the instructions to obtain the intended result. The computer programmed with this fact-based knowledge does not understand English any more than the English speaker understands Chinese. Both of them are following rules instead of understanding what is being asked and responding based their interpretation. The defeat of the fact-based program poses problems for strong A.I. supporters. It shows that any program that relies on pre-made a set of rules (no matter how complex) cannot understand in the same way that a human mind does. In fact Searle argues: “… in the literal sense the programmed computer understands what the car and the adding machine understand, namely, exactly nothing” (Searl 511). However Searle’s argument doesn’t rule out all programs. A program that learns from scratch, without the use of a rulebook or a prefabricated fact database, can understand in the same way that a human can. I will now go about describing such a program. To construct the fact-based program we attempted to record facts about the world. The learning program takes an orthogonal approach. It attempts to program the computer to learn these facts for itself. To see how to go about this let us examine how a small child learns. A child comes into the world knowing very little. She does not know how to talk, walk or understand English. She goes about learning these abilities with three tools. First she has basic goals or needs. Some of a child’s needs are food, water and shelter. Second she can observe the world. A child can tell that when she is eating, she is getting less hungry. Finally she can remember what has happened to her. Let me demonstrate how these three tools allow her to learn something. Imagine that this child is hungry. She observes that when she cries her mother brings her food. She remembers Page 5 what has happened to her and finally her need for food causes her to cry again the next time she’s hungry. Her tools have allowed her to learn that crying results in getting food. These three tools are the core of the learning program. However, the goals of a computer will differ from the goals of a human. A computer has no need for food or water so they are not appropriate goals. Instead these goals can be anything that A.I. programmers think are important. Isaac Asimov proposed three such goals (or laws) in his fictional stories 4 : 1. A robot may not injure a human being or, through inaction, allow a human being to come to harm. 2. A robot must obey the orders given it by human beings, except where such orders would conflict with the First Law. 3. A robot must protect its own existence, as long as such protection does not conflict with the First and Second Laws. In short a robot’s goals are human well-being, human will and its own well-being. These goals can be implemented in the form of variables linked to actions that the computer might perform. Whenever the computer does something that accomplishes one of its goals it might raise the value of the variables connected with its current state or action. Similarly it would lower the values of these action-variables when it did something against its goals. These variables also represent the computer’s memory. This is where the computer remembers what to do the next time it is in a similar situation. Finally the computer needs a console, sensors or some other form of input so it can observe what is happening around it. Let me demonstrate how it works with a simple example. Imagine a robot equipped with a camera, a flashlight and wheels. The robot is put in an environment and given the extra goal of reaching a certain spot. If the robot had 4 First published in Runaround in 1940. Page 6 never been in this situation before it might have no idea of how to reach the goal in much the same way that the child does not know how to get food. So it might begin by doing any number of things. Perhaps it would turn on its flashlight. This would not help it reach it’s goal so would try something different. Maybe it starts driving towards the goal. The robot would observe that it is accomplishing a goal so the “going forward” action might get a “+ 1 points” in the “trying to reach an object” context. Perhaps there is a wall in front of it halfway to the flag. It runs into the wall and damages itself. This is bad for the “well-being of self” goal so the “driving forward” action might get “–1 points” in the “wall in front of me” context. These point value will help it remember what to do next time it is trying to get from one point to another. When it sees a wall infront of it in the future, the robot will see that “driving forward” has less points than, say, “driving sideways” and might pick that option. The fact that it wants to reach its goals will teach the robot through trial and error. Eventually it will learn how do drive around objects (instead of into them). I argue that a robot constructed in this fashion would actually understand how to accomplish goals. To support this belief, let’s see if it does any better with the Chinese room example. Remember that for the fact-based program the person inside the room is analogous to the computer CPU and the rulebook is analogous to the program. However, for the learning program there is no rulebook. The person inside the room is analogous to both the CPU and the program. Instead of people asking questions and having him answer back, imagine that the input through the slot in his room is the information he receives from the outside world. At first he has no idea what this input means. He sends random symbols back but after a while he notices a correlation between what he sends Page 7 out and what he gets back. He starts to write his own rulebook in his head from this information that allows him to translate Chinese input into English. When he writes back he translates the answers that he thought of in English back to Chinese. The way the “learning-program person” can communicate in Chinese is drastically different than the way the “fact-based person” does. The “learning-program person” learns what the Chinese means by association. From his knowledge he knows the sense of the words. Some people may point out that he does not actually think in Chinese so he must not understand the language. However, there are many people who converse in a non-native tongue. We cannot claim that these people’s understanding of the world is different than our own. Searl might respond to this learning-program by saying that the person inside the Chinese room would simulate the entire learning process and that the learning is not internal but external. This means that the person inside of the room is following directions that correspond to learning but he himself is not learning. But if such a program falls victim to the Chinese room, wouldn’t a human brain fall victim as well? Let us imagine a modified Chinese room for the human brain. Instead of the man inside of the Chinese room simulating a computer program, he simulates the neurons in someone’s brain. When he receives input, he would keep track of what neurons get excited and calculate whether or not they fire. He would know from his rulebook (a compendium of the laws of physics, chemistry and biology that would allow him to completely simulate the inner workings of the brain) that when certain neurons fired that he should output an answer. The person simulating the brain doesn’t understand Chinese any better than the one simulating a computer program. Why would one be different than Page 8 the other? Searl’s opinion is that “actual human mental phenomena might be dependant on actual physical-chemical properties of actual human brains” (Searl 519). Penrose’s “The emperor’s new mind” provides insight as to why this may be the case. Penrose mentions many physical processes that are not computable. He first examines the Mandelbrot set. The Mandelbrot set is created by mapping a formula using the combination of real and complex numbers. The result is an Argand Plane. Here is where Penrose brings up an important comment: “We might think of using some algorithm for generating the successive digits of an infinite decimal expansion, but it turns out that only a tiny fraction of the possible decimal expansions are obtainable in this way: the computable numbers” (Penrose 648). In other words, the exact notion of the Mandelbrot set cannot be computed with a computer. Penrose also mentions quantum mechanical principles. Tiny sub-atomic particles do not follow the same laws of physics that larger objects do. The superposition principle states that a particle can be in many different states at the same time. These states are defined by factors of complex numbers and thus are another example of a physical law that cannot be simulated in a computer. These two examples may show why the Chinese room cannot simulate the human brain. When the person inside of the room was following the directions for simulating a computer the steps he took were explained by a well-defined algorithm. This is because computers are Turing machines, a concept that was formalized elegantly by Alan Turing. All Turning machines can be thought of as a device that reads and writes from an infinitely long tape. On the tape is a sequence of partitions that are either blank or marked. The device operates by moving either left or right on the tape. It can change the current section to either “marked” or “blank” and read its current state. It does this by Page 9 following a finite set of instructions. This simple abstraction is enough to run any computer program no matter how complex. It is easy to think of the human inside of the Chinese room controlling a Turing machine. The brain may, however, rely on non-algorithmic processes than the person inside the Chinese room will not be able to follow. If, for example, neuron X would fire only because of a certain arrangement of subatomic particles, there would be no hard set directions for what the Chinese-room-person should do. Perhaps the next instruction has a random chance of occurring, if so the person will be confused and unable to complete the instruction. It is important to find out whether the brain makes use of these processes because if it does, it would explain why the Chinese room works for computers but not for the human brain. In the chapter “Where lies the physics of the mind,” Penrose argues that the brain does indeed make use of non-computable phenomenon. He contends that expressions that deal with consciousness such as “understanding” and “judgment” and those that do not such as “mindlessly” and “automatically”, suggest a distinction between two parts of the brain: algorithmic and non-algorithmic (Penrose 653). Penrose brings up Godel’s incompleteness theorem as an example of how the brain makes use of non-algorithmic part of the brain. Godel encoded first order predicate calculus into normal arithmetic using prime numbers. By breaking down F.O.P.C. in this way, he could write out arithmetic formulas that would equate to either true or false. He used this trick to demonstrate that there are some statements that cannot be proven or disproved. One such sentence would be: "A computer which knows the answer to all questions will never Page 10 prove that this sentence is true.” 5 Human beings know that this sentence is true without actually going through the process of proving it. If, however, a computer attempts to assess the validity of the state through a formal proof it will be confused because the statement remains true until the proof is complete. Penrose argues that these types of sentences, which humans can reason about, would be impossible for a computer to understand. What Penrose doesn’t notice is that even if some statements could not be proved or disproved using FOPC logic, there are other ways for computers to approach these problems. There is no reason that computers couldn’t use higher logic to solve puzzles just like a human does. Penrose’s goal of proving strong A.I. impossible fails because he doesn’t make the link between the non- algorithmic/non-computable physical phenomenon and the human brain. If in the future neuroscientists discovered that the brain relies on such processes then his argument would hold more weight. Still, it would be possible for a program to simulate the workings of the brain without simulating the actual physical processes. In fact, computers and human brains excel at different tasks, a fact which makes literal simulations wasteful. A computer can remember things for an infinite amount of time (assuming the file isn’t deleted). It can also compute complicated mathematical expressions in milliseconds. Even a human with the best eidetic memory or an extraordinary mathematical talent couldn’t rival a computer in these tasks. On the other hand, computers have a very hard time recognizing objects such as human faces. In dark or light, different clothes or dyed hair, we can still recognize our best friend. Similarly the human ability to understand language is amazing. We can utter sentences that we have never said or heard before and understand a variety of accents and slang. These 5 Adapted from Denton Page 11 “human algorithms” which require almost no effort for us are very difficult for a computer. To throw away a computer’s advantages in mathematics, memory and many other tasks seem a waste. Yet attempting to create a model of human neurons seems to do exactly that. Instead, it would be better to attempt to simulate the way a human brain solves problems instead the actual physical processes behind human thinking. In this paper I have shown how various arguments against strong A.I. interact. These arguments do not show that it is impossible but do restrict what kind of programs can be thought of as “truly intelligent”. Searl’s Chinese room argument shows that fact- based programs are incapable of understanding things in the same way as humans do. It also excludes programs that have all their information hard coded in. Learning is essential to programs that wish to support strong A.I. because information has to come from the program, not from the programmer. Penrose has suggested that the brain is unable to be simulated by a computer. If this is true than computers must be a simulation of how the brain thinks not how the brain works. Finally Godel’s incompleteness theorem shows that programs must use higher reasoning to achieve its goals. Philosophy is often criticized for being un concerned with real world implications but in this case it has shown the best direction for A.I. researchers to explore. Page 12 References Books Clancey, William J. 1997. Situated Cognition. Cambridge, UK: Cambridge University Press. Dreyfus, Hubert. 1992. What Computers Still Can't Do: A Critique of Artificial Reason. Cambridge, MA: MIT Press. Kim, Jaegwon. 1998. Philosophy of Mind. Boulder Colorado: Westview Press Inc. Penrose, Roger. 1989. The Emperor's New Mind: Concerning Computers, Minds and the Laws of Physics. Oxford: Oxford University Press. Russell, Smart and Norvig, Peter. 1995, Artificial Intelligence: A Modern Approach Smith, Brian Cantwell. 1996. On the Origin of Objects. Cambridge, MA: MIT Press/Bradford Books. Papers Dennett, Daniel C. 1988. When Philosophers Encounter Artificial Intelligence. The Artificial Intelligence Debate: False Starts, Real Foundations: 283-296. Fodor, J.A. 1980. Searl on What Only Brain Can Do. The Nature of Mind: 520. Fodor, J.A. 1998. After-thoughts: Yin and Yang in the Chinese Room. The Nature of Mind: 524. LaForte, Geoffrey, Patrick J. Hayes, and Kenneth M. Ford. 1998. Why Godel's Theorem Cannot Refute Computationalism. Artificial Intelligence: 211-264. McCarthy, Daniel C. 1988. Mathematical Logic in Artificial Intelligence. The Artificial Intelligence Debate: False Starts, Real Foundations: 297-311 Putnam, Hillary. 1988. Much Ado About Not Very Much. The Artificial Intelligence Debate: False Starts, Real Foundations: 269-282. Sokolowski, Robert. 1988. Natural and Artificial Intelligence. The Artificial Intelligence Debate: False Starts, Real Foundations: 45-64. Searl, John R. 1980. Minds, Brains and Programs. The Nature of Mind: 509-519. Searl, John R. 1980. Author’s response. The Nature of Mind: 521-523. Searl, John R. 1998. Ying and Yang Strike Out. The Nature of Mind: 525. Turing, A.M. (1950). Computing machinery and intelligence. Mind, 59, 433-460. Journals Gary H. Anthes, Computerizing Common Sense. Computerworld. 4/8/02. Electronic Cycorp: Company Overview. http://www.cyc.com/overview.html Denton, Willaim. 2000. Godel’s Incompleteness Theorem http://www.miskatonic.org/godel.html This is the html version of the file http://courses.cs.vt.edu/~masc1044/slidesfolder/Ch12.pdf. G o o g l e automatically generates html versions of documents as we crawl the web. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:wN0iV-1HNQYJ:courses.cs.vt.edu/~masc1044/slidesfolder/Ch12.pdf+%22artificial+intelligence%22+%22common+sense%22+site:edu+pdf&hl=en&client=firefox-a Google is not affiliated with the authors of this page nor responsible for its content. These search terms have been highlighted: artificial intelligence common sense These terms only appear in links pointing to this page: pdf Page 1 Chapter 12 The Computer Continuum 1 The Computer Continuum 12-1 Chapter 12: Artificial Intelligence and Modeling the Human State Are computers smart enough to replace people? The Computer Continuum 12-2 Artificial Intelligence and Modeling the Human State In this chapter: • Does “looking intelligent” mean that intelligence is present? • How does the human brain differ from a computer? • How does a computer gain and retrieve knowledge as compared to how a human gains and retrieves knowledge? • How is it that a computer can recognize text, speech, or a human face? • How are computer scientists making computers “smarter?” The Computer Continuum 12-3 What is Intelligence: Artificial or Not? Attempts to understand intelligence: • Plato (400 BC) - This Greek philosopher believed that ethereal spirits were rained down from heaven and entered the body. • Aristotle (Plato’s student) - The heart must contain the soul and the brain’s function was to cool the blood. • Galen - Treated fallen gladiators with spinal cord injuries. Noted that feeling lost in certain limbs sometimes came back. • Galvani - Used Benjamin Franklin’s findings about static electricity to show that static electricity stimulated the nerves causing a frog to jump. • Subsequently - Human nervous system found to be a complex network of billions of neurons. Page 2 Chapter 12 The Computer Continuum 2 The Computer Continuum 12-4 What is Intelligence: Artificial or Not? Does “looking intelligent” mean that intelligence is present? • Maillardet’s Automaton (Henri Malliardet, 1805): – Object having human form seemed to mimic the intelligence of the human. – Drawing machine. • Disguised as a young boy. • Containing levers, ratchets, cams and other mechanical devices. • Could draw several complex images. – Because it had human form and could draw complex images, a certain feeling of intelligence was ascribed to the machine. The Computer Continuum 12-5 What is Intelligence: Artificial or Not? Sailing vessel drawn by Maillardet’s Automaton. The Computer Continuum 12-6 What is Intelligence: Artificial or Not? Alan Turing (1912 - 1954) • Proposed a test - Turing’s Imitation Game – Tests the intelligence of the computer. • Phase 1: – Man and woman separated from an interrogator. – The interrogator types in a question to either party. – By observing responses, the interrogator’s goal was to identify which was the man and which was the woman. Interrogator Honest Woman Lying Man Page 3 Chapter 12 The Computer Continuum 3 The Computer Continuum 12-7 What is Intelligence: Artificial or Not? Phase 2 of the Turing’s test: • The man was replaced by the computer. • If the computer could fool the interrogator as often as the person did, it could be said that the computer had displayed intelligence. Interrogator Honest Woman Computer The Computer Continuum 12-8 Modeling Human Intelligence Modeling human intelligence systems: • One way to study complex systems is to build a working model of the system, and observe it in action. • Two (of several) approaches to model some of the thinking patterns of the human brain: – Semantic networks – Rule-based systems or Expert systems The Computer Continuum 12-9 Modeling Human Intelligence Semantic networks are designed after the psychological model of the human associative memory. John Plumber Worker Owner Ford Car May 97 Time Oct 00 Ownership Situation Is a Is a Is a Is a Is a Is a Owner Ownee Start-time End-time Page 4 Chapter 12 The Computer Continuum 4 The Computer Continuum 12-10 Modeling Human Intelligence Rule-based or Expert systems - Knowledge bases consisting of hundreds or thousands of rules of the form: IF (condition) THEN (action). • Use rules to store knowledge (“rule -based”). • The rules are usually gathered from experts in the field being represented (“expert system”). – Most widely used knowledge model in the commercial world. – IF (it is raining AND you must go outside) – THEN (put on your raincoat) The Computer Continuum 12-11 Modeling Human Intelligence For any of these models of the human knowledge system to work, it must be able to make use of this human knowledge in three different ways: • Acquisition - Must be some way of putting information or knowledge into the system. • Retrieval - Must be able to find knowledge when it is wanted or needed. • Reasoning - Must be able to use that knowledge through “thinking” or reasoning. The Computer Continuum 12-12 Modeling Human Intelligence Knowledge Acquisition: • A fact is the simplest type of knowledge that can be acquired. – Bees sting. • Ideas, concepts, and relationships are more difficult for humans and machines. – Provoking bees causes them to sting. – What isa chair? Page 5 Chapter 12 The Computer Continuum 5 The Computer Continuum 12-13 Modeling Human Intelligence Knowledge Retrieval by Searching • After knowledge has been acquired and stored in one’s memory, it can be retrieved and used to solve problems. • Brute -force search- Looks at every possible solution before choosing among them. – Hexapawn game example: The program searches through all the possible moves and then selects the best. The Computer Continuum 12-14 Modeling Human Intelligence Hexapawn Game Tree Shows different moves (“mirror images” are not shown.) The Computer Continuum 12-15 Modeling Human Intelligence Heuristic search - Rules of thumb, which are used to limit the number of items that must be searched in solving a problem. (Not guaranteed to lead to a solution.) • Used by more complex systems such as those that diagnose individuals that are prone to heart attacks. • Chess game tree would have 10 120 possible moves. – Uses rules of thumb to reduce the number of possible plays. • Example: Examine a few plays ahead instead of all the ways to the end of the game. – Deep Blue (1996) by IBM - Garry Kasparov, world -champion chess player, won over Deep Blue 4 points to 2. – Deep Blue (1997) by IBM - Garry Kasparov conceded victory to Deep Blue, 3.5 points to 2.5. Page 6 Chapter 12 The Computer Continuum 6 The Computer Continuum 12-16 Modeling Human Intelligence Reasoning with knowledge • Humans: Reasoning is what we do when we solve problems. • In Artificial Intelligence: Two types of reasoning are commonly used. – Shallow reasoning: Based on heuristics or rule -based knowledge. • Computers, for the most part, do shallow reasoning. – Deep reasoning: Deals with models of the problem obtained from analyzing the structure and function of component parts of the problem. • Humans commonly apply deep reasoning. The Computer Continuum 12-17 Modeling Human Intelligence How can the knowledge base be built up so that there is sufficie nt knowledge to reason with? • Learning systems: Intelligent computer programs that are capable of learning. • Types of learning that are used to write intelligent programs: – Rote learning - Memorization of facts. – Learning by instruction - Similar to student/teacher relationship found in classrooms. – Learning by deduction - Drawing conclusions from certain premises (This is a cat. All cats are animals. Therefore, this is an animal.) – Learning by induction - Includes subcategories: learning by example, experimentation, observation, and by discovery. – Learning by analogy - Combines both deductive and inductive learning. (Being bitten by a teased dog may make an individual not tease bees.) The Computer Continuum 12-18 Modeling Human Intelligence Common Sense • Problems that seemed to be most difficult, such as playing chess, turned out to be relatively simple. • The computer must be able to make inferences from the knowledge base. – Answers to problems might not be listed. – The computer will need to come up with its own answers! – This has been a very difficult area in Artificial Intelligence . • Cyc (enCYClopedia ) Computer program that exhibits and can apply common sense. – e-Cyc: (Electronic commerce) Advanced search engine narrows a search and gives list of meaningful subtopics. Page 7 Chapter 12 The Computer Continuum 7 The Computer Continuum 12-19 Natural Language Communication The semantics (meaning of the words) is the goal in communication with the computer. • ELIZA (1965) by Joseph Weizenbaum of MIT – Simulation of a therapist mirroring a patient’s conversation in a positive manner. – The program fooled people into believing it was a therapist. – In response of ELIZA’s popularity, Weizenbaum wrote a book in 1976: Computer Power and Human Potential. • “Science promised man power. But, as so often happens when people are seduced by promises of power… the price actually paid is servitude and impotence.” The Computer Continuum 12-20 Natural Language Communication Part of an ELIZA Conversation: All men are alike. IN WHAT WAY They’re always bugging us about something or other. CAN YOU THINK OF A SPECIFIC EXAMPLE Well, my boyfriend made me come here. YOUR BOYFRIEND MADE YOU COME HERE He says I’m depressed much of the time. I AM SORRY TO HEAR YOU ARE DEPRESSED It’s true, I am unhappy. DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY I need some help, that much seems certain. WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP Perhaps I could learn to get along with my mother. TELL ME MORE ABOUT YOUR FAMILY The Computer Continuum 12-21 Natural Language Communication Semantic Translation Problems (Problems with language translation). • A classic example, called the Bar-Hillel paradox, illustrates a difficult semantic problem: The pen is in the box. The box is in the pen. – Both sentences have identical syntax structures. – Interpretations: • First statement: A writing instrument is in the box. • Second statement: A box is in the playpen. • Convinced Bar-Hillel that computer translation of languages was impossible. Page 8 Chapter 12 The Computer Continuum 8 The Computer Continuum 12-22 Natural Language Communication Early attempts at language translation: • An early attempt to translate an English expression to Russian and back again to English: – Typed in English (sentence to be translated...): • The spirit is willing, but the flesh is weak. – Translated by the program into Russian and back into English: • The vodka is strong, but the meat is rotten. Translation programs have come a long way. • WWW translation programs – Accuracy and interpretation still very crude. The Computer Continuum 12-23 Expert Systems Expert systems are commercially the most successful domain in Artificial Intelligence. • These programs mimic the experts in whatever field. Auto mechanic Telephone networking Cardiologist Delivery routing Organic compounds Professional auditor Mineral prospecting Manufacturing Infectious diseases Pulmonary function Diagnostic internal medicine Weather forecasting VAX computer configuration Battlefield tactician Engineering structural analysis Space-station life support Audiologist Civil law The Computer Continuum 12-24 Expert Systems Expert systems are also called Rule-based systems. • Expert’s expertise is built into the program through a collection of rules. • The desired program functions at the same level as the human expert. • The rules are typically of the form: – If (some condition) then (some action) – Example: If (gas near empty AND going on long trip) then (stop at gas station AND fill the gas tank AND check the oil). • EXCON: An expert system used by Digital Equipment Corp. to help configure the old VAX family of minicomputers. Page 9 Chapter 12 The Computer Continuum 9 The Computer Continuum 12-25 Expert Systems Two major parts of an expert system: • The knowledge base: The collection of rules that make up the expert system. • The inference engine: A program that uses the rules by making several passes over them. – On each pass, the inference engine looks for all rules whose condition is satisfied (if part). – It then takes the action (then part) and makes another pass over all the rules looking for matching condition. – This goes on until no rules’ conditions are matched. – The results are all those action parts left. The Computer Continuum 12-26 Expert Systems Inference engines can pass through the rules in different directions: • Forward chaining: Going from a rule’s condition to a rule’s action and using the action as a new condition. • Backward chaining: Goes in the other direction. – Example: Medical doctors use both. • Forward chaining: Going to the doctor with symptoms (stomach pain). The doctor will come up with a diagnosis (ulcer). • Backward chaining: The doctor asks if patient has been eating green apples knowing green apples cause stomach aches. The Computer Continuum 12-27 Expert Systems Harold Cohen created an expert system called AAORN to create art in 1973. • AARON is a collection of over 1,000 rules. – Includes information regarding human anatomy and gravity. • AARON is free to draw what it may draw. It then colors the drawings. • A PC-version of AARON is being prepared for mass distribution. Page 10 Chapter 12 The Computer Continuum 10 The Computer Continuum 12-28 Neural Networks Neuron: Basic building-block of the brain. • There are several specialized types, but all have the same basic structure: • The basic structure of an animal neuron. The Computer Continuum 12-29 Neural Networks Artificial models of the brain are of two distinct types: • Electronic: Has electronic circuits that act like neurons. • Software: This version runs a program on the computer that simulates the action of the neurons. The Computer Continuum 12-30 Neural Networks Artificial neurons : Commonly called processing elements, are modeled after real neurons of humans and other animals. • Has many inputs and one output. – The inputs are signals that are strengthened or weakened (weighted). – If the sum of all the signals is strong enough, the neuron will put out a signal to the output. Output Artificial Neuron Inputs Page 11 Chapter 12 The Computer Continuum 11 The Computer Continuum 12-31 Neural Networks Neural Network: A collection of neurons which are interconnected. The output of one connects to several others with different strength connections. • Initially, neural networks have no knowledge. (All information is learned from experience using the network.) Input 1 Input 2 Input 3 Neuron 1 Neuron 2 Output from Neuron 1 Output from Neuron 2 The Computer Continuum 12-32 Neural Networks Training a Neural Network • Supervised training: – Occurs when the neural network is given input data. – The resulting output is compared to the correct input. – The strengths of the connections are then modified so as to minimize errors in succeeding input/output pairs. • Example: Back propagation: This method of learning is divided into two phases: 1. The inputs are applied to the network, and the outputs compared with the correct output. 2. The resulting information about any error is fed backwards through the network, adjusting the connection strengths to minimize the error. The Computer Continuum 12-33 Neural Networks Neural networks in action: A case study. • Mortgage Risk Evaluator. – Data from several thousand mortgage applicants was used to train a neural network. • Credit data of each individual was paired with each loan result. • Patterns for successful loans and defaults of mortgages were contained in the data. • The neural network’s weights (measurements of strengths) were adjusted to match the actual output. – Now, a new mortgage applicant is entered as input. The program determines whether they are a bad risk. Page 12 Chapter 12 The Computer Continuum 12 The Computer Continuum 12-34 Evolutionary Systems Alan Turing, in 1950, identified three attributes that are the basis for what is now termed genetic programming. • Heredity • Mutation • Natural selection • Evolution is being used to create or grow programs. The Computer Continuum 12-35 Evolutionary Systems Genetic Algorithm (simulated evolution): • Mimics the processes in the genetics of living systems. • Created by John Holland (mid-1960’s) U. of Michigan. • The human puts together the system and specifies the desired results, but the details on how it is done are left to evolve. • Example: Koza, a student of Holland, developed a system that had tree-structured chromosomes. – Using basic astronomical data, his system came up with Kepler’s 3rd law of planetary motion. • “the cube of a planet’s distance from the sun is proportional to the square of its period” • Major problem with genetic algorithms: An intimate knowledge of the system must be known. The Computer Continuum 12-36 Evolutionary Systems Genetic Programming: • A technique that follows Darwinian evolution. • The evolution takes place directly on the programs in the population that are striving to reach the goal specified by the programmer. – Only the goal is known and possibly some of the structure of the solution.. Page 13 Chapter 12 The Computer Continuum 13 The Computer Continuum 12-37 Complex Adaptive Systems Complex adaptive systems: A collection of many parts individually operating under relatively simple rules, and are highly interactive in a nonlinear way. • Their parts are self organizing, operate in parallel, and exhibit emergent behavior (totally unpredictable results can occur). • The system of parts evolves with natural selection operating. • Example: Mound-building termite colonies in Australia. – Mounds can be several feet high. – Termites follow a simple set of rules. – Mounds affect what can grow around it. The Computer Continuum 12-38 Complex Adaptive Systems Chaos: • Described as a situation where things seem unorganized and unpredictable. • Tiny changes in the starting point produce solutions to a problem that seem to have almost random results. • “Butterfly affect”: A tiny flip of a butterfly’s wings could sta rt a hurricane. Artificial life: (a-life) • A phenomena in computers that has attributes of life. • Some argue that computer viruses are a form of a -life. The Computer Continuum 12-39 Natural Language Translation Two distinct classes of translation software: • One works while you are on the WWW. – Can be a direct translation of a complete Web page or parts of its foreign language text. • The other is a standalone piece of software that is used to translate files of foreign language text. – Many are available. • Simply Translating is a program that costs under $50.00. Page 14 Chapter 12 The Computer Continuum 14 The Computer Continuum 12-40 Natural Language Translation Web-based Language Translation • Babel Fish (Free service on Alta Vista) – Text is cut and then pasted into a translation box. – “Test translation” from English to Italian and back: • The spirit is willing, but the flesh is weak. • The spirit is arranged, but the meat is weak person. • FreeTranslation.com – Allows you to enter a URL and then translates it. – Also does text entry for direct translation to and from English. – “Test translation” from English to German and back: • The spirit is willing, but the flesh is weak. • The intellect is ready, but the meat is weak. This is the html version of the file http://www.eecs.umich.edu/~rthomaso/documents/nls/nmslite.pdf. G o o g l e automatically generates html versions of documents as we crawl the web. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:aYTte7oTiIkJ:www.eecs.umich.edu/~rthomaso/documents/nls/nmslite.pdf+%22artificial+intelligence%22+%22common+sense%22+site:edu+pdf&hl=en&client=firefox-a Google is not affiliated with the authors of this page nor responsible for its content. These search terms have been highlighted: artificial intelligence common sense These terms only appear in links pointing to this page: pdf Page 1 Formalizing the Semantics of Derived Words Richmond H. Thomason Philosophy Department University of Michigan Ann Arbor, MI 48109-2110 U.S.A. March 24, 2001 Working Draft of a Paper in Progress This is a working draft: version of March 24, 2001. The material is volatile; do not quote. Comments welcome. Page 2 1. Introduction The logical approach that has been so successful in the semantic interpretation of syntactic structure has never produced a very satisfactory account of word meaning. This paper is intended to promote and illustrate an approach to that problem. I believe that this approach leads to a wider problem that brings together elements of linguistics and philosophy in an illuminating way. But the single case study that I provide here, while it may be suggestive, does not go far enough to make a good case for the more general point. This paper is extracted from a larger collection of documents, and is intended to motivate and illustrate the ideas. But I hope that even a partially successful and fragmentary sketch of the larger project may convince some members of my audience that the natural language semantics community and the subgroup of the AI community interested in formalizing common sense knowledge have a great deal in common, and much to learn from one another, and that what they have to learn is useful and important for philosophy. 2. Logicism 1 I want to begin by situating certain problems in natural language semantics with respect to larger trends in logicism, including: (i) Attempts by positivist philosophers earlier in this century to provide a log- ical basis for the physical sciences; (ii) Attempts by linguists and logicians to develop a “natural language ontology” (and, presumably, a logical language that is related to this ontology by formally explicit rules) that would serve as a framework for natural language semantics; (iii) Attempts in artificial intelligence to formalize common sense knowledge. Frege did a lot for logic, but I think he left us with an undeservedly narrow and unpromis- ing version of logicism that is entirely too focused on the subject matter of mathematics and the analytic tool of definition. Let X be a topic of inquiry. X logicism is the view that X should be presented as an axiomatic theory from which the rest can be deduced by logic. Science logicism is expressed as an ideal in Aristotle’s Organon. But Aristotle’s logic is far too weak to serve as a means of representing Aristotelian science, and logicism remained impracticable until the 17th century, when a separation of theoretical science from common sense simplified the task of designing an underlying logic. 2 There is a moral here about logicism. X logicism imposes a program: the project of actually presenting X in the required form. But for the project to be feasible, we have to choose a logic that is adequate to the demands of the topic. If a logic must involve explicit formal patterns of valid reasoning, the central problem for X logicism is then to articulate formal patterns that will be adequate for formalizing X. 1 The material in this and the subsequent section is lifted in part from [Thomason, 1991]. 2 Despite the simplification, of course, a workable formalism did not begin to emerge until the 19th century. Page 3 The fact that very little progress was made for over two millennia on a problem that can be made to seem urgent to anyone who has studied Aristotle indicates the difficulty of finding the right match of topic and formal principles of reasoning. Though some philosophers (Leibniz, for one) saw the problem clearly, the first instance of a full solution is Frege’s choice of mathematical analysis as the topic, and his development of the Begriffschrift as the logical vehicle. It is a large part of Frege’s achievement to have discovered a choice that yields a logicist project that is neither impossible nor easy. I will summarize some morals. (1) Successful logicism requires a combination of a formally presented logic and a topic that can be formalized so that its inferences become logical consequences. (2) When logicist projects fail, we may need to seek ways to develop the logic. (3) Logic development can be difficult and protracted. 3. Extensions to the empirical world The project of extending Frege’s achievement to the empirical sciences has not fared so well. Of course, the mathematical parts of sciences such as physics can be formalized in much the same way as mathematics. Though the metamathematical payoffs of formalization are most apparent in mathematics, they can occasionally be extended to other sciences. 3 But what of the empirical character of sciences like physics? One wants to relate the systems described by these sciences to observations. Rudolph Carnap’s Aufbau 4 was an explicit and ambitious attempt to extend mathematics logicism to science logicism, by providing a basis for formalizing the empirical sciences. The Aufbau begins by postulating elementary units of subjective experience, and attempts to build the physical world from these primitives in a way that is modeled on the constructions used in Frege’s mathematics logicism. Carnap believed strongly in progress in philosophy through cooperative research. In this sense, and certainly compared with Frege’s achievement, the Aufbau was a failure. Nelson Goodman, one of the few philosophers who attempted to build on the Aufbau, calls it “a crystallization of much that is widely regarded as worst in 20th century philosophy.” 5 After the Aufbau, the philosophical development of logicism becomes somewhat frag- mented. The reason for this may have been a general recognition, in the relatively small community of philosophers who saw this as a strategically important line of research, that the underlying logic stood in need of fairly drastic revisions. 6 This fragmentation emerges in Carnap’s later work, as in the research of many other logically minded philosophers. Deciding after the Aufbau to take a more direct, high-level approach to the physical world, in which it was unnecessary to construct it from phenomenal primitives, Carnap noticed that many observation predicates, used not only in the sciences but in common sense, are “dispositional”—they express expectations about how things will behave under certain conditions. A malleable material will deform under relatively light pressure; a flammable material will burn when heated sufficiently. It is natural to use the word ‘if’ in defining such predicates; but the “material conditional” of Frege’s logic gives 3 See [Montague, 1962]. 4 [Carnap, 1928]. 5 [Goodman, 1963], page 545. 6 I can vouch for this as far as I am concerned. 2 Page 4 incorrect results in formalizing such definitions. Much of [Carnap, 1936 1937] is devoted to presenting and examining this problem. Rather than devising an extension of Frege’s logic capable of solving this problem, Carnap suggests dropping the requirement that these predicates should be explicated by definitions. This relaxation makes it harder to carry out the logicist program, because a natural way of formalizing dispositionals is forfeited. But it also postpones a difficult logical problem, which was not, I think, solved adequately even by later conditional logics in [Stalnaker and Thomason, 1970] and [Lewis, 1973]. Such theories do not capture the notion of normality that is built into dispositionals: a more accurate definition of ‘flammable’, for instance, is ‘what will normally burn when heated sufficiently’. Thus, logical constructions that deal with normality offer some hope of a solution to Carnap’s problem of defining dispositionals. Such constructions have only become available with the development of nonmonotonic logics. Although the logicist program has turned into a number of disparate logicist projects, from around 1970 on we have seen steady, cumulative progress on these projects. Most of this progress has been made not by philosophers, but by linguists and computer scientists; large-scale formalization projects and the development of logics appropriate for them are now far more common in these other fields than in philosophy. Works like [Dowty, 1979] and [Link, 1983] (by linguists) and [Davis, 1991] (by a computer scientist) illustrate the point. The logical tools that are currently used by philosophers in thinking about philosophical problems are over thirty years old. In fact, except for a relatively narrow group of specialists, the philosophical community remains unaware of the newer developments and their relevance to philosophy. This project is meant to illustrate what can be done to illuminate a historically important problem using methods from nonmonotonic logic (a contribution from computer science) and the theory of eventuality structure (a contribution from linguistics). It relies heavily on work of Mark Steedman, who works in both linguistics and computer science. 7 It can also be seen as part of a linguistic project concerning the meanings of complex or derived words. 4. Linguistic logicism In linguistics, a clear logicist tradition emerged from the work of Richard Montague, a philosopher who (building to a large extent on Carnap’s work in [Carnap, 1956]) developed a logic he presented as appropriate for philosophy logicism. Montague motivates his logical framework in [Montague, 1969] with a problem in the semantics of derived words: the need to relate empirical predicates like ‘red’ to their nomi- nalizations, like ‘redness’. He argued that many such nominalizations denote properties, that terms like ‘event’, ‘obligation’, and ‘pain’ denote properties of properties, and that proper- ties should be treated as functions taking possible worlds into extensions. The justification of this formal ontology, and of the logical framework that goes with it, consists in its abil- ity to formalize certain sentences in a way that allows their inferential relations with other sentences to be captured by the underlying logic. Philosophers other than Montague—not only Frege, but Carnap in [Carnap, 1956] and 7 See [Steedman, 1998]. 3 Page 5 Church in [Church, 1951]—had resorted informally to this methodology. But Montague was the first to see the task of natural language logicism as a formal challenge. By actually formalizing the syntax of a natural language, the relation between the natural language and the logical framework could be made explicit, and systematically tested for accuracy. Montague developed such formalizations of several ambitious fragments of English syntax in several papers, of which [Montague, 1973] was the most influential. The impact of this work has been more extensive in linguistics than in philosophy. Formal theories of syntax were well developed in the early 1970s, and linguists were used to using semantic arguments to support syntactic conclusions, but there was no theory of semantics to match the informal arguments. “Montague grammar” quickly became a paradigm for some linguists, and Montague’s ideas and methodology have influenced the semantic work of all the subsequent approaches that take formal theories seriously. As practiced by linguistic semanticists, language logicism would attempt to formalize a logical theory capable of providing translations for natural language sentences so that sentences will entail one another if and only if the translation of the entailed sentence follows logically from the translation of the entailing sentence and a set of “meaning postulates” of the semantic theory. It is usually considered appropriate to provide a model-theoretic account of the primitives that appear in the meaning postulates. This methodology gives rise naturally to the idea of “natural language metaphysics,” which tries to model the high-level knowledge that is involved in analyzing systematic rela- tions between linguistic expressions. For instance, the pattern relating the transitive verb ‘bend’ to the adjective ‘bendable’ is a common one that is productive not only in English but in many languages. So a system for generating derived lexical meanings should include an operator able that would take the meaning of ‘bend’ into the meaning of ‘bendable’. To provide a theory of the system of lexical operators and to explain logical interactions (for instance, to derive the relationship between ‘bendable’ and ‘deformable’ from the re- lationship between ‘bend’ and ‘deform’), it is important to provide a model theory of the lexical operators. So, for instance, this approach to lexical semantics leads naturally to a model-theoretic investigation of ability, 8 a project that is also suggested by a natural train of thought in logicist AI. 9 Theories of natural language meaning that, like Montague’s, grew out of theories of mathematical language, are well suited to dealing with quantificational expressions, as in (4.1) Every boy gave two books to some girl, In practice, despite the original motivation of his theory in the semantics of word formation, Montague devoted most of his attention to the problems of quantification, and its interaction with the intensional and higher-order apparatus of his logical framework. But some of those who developed Montague’s framework turned their attention to lexical problems, and a body of the later research in Montague semantics—especially David Dowty’s 8 That the core concept that needs to be clarified here is ability rather than the bare conditional ‘if’ is suggested by cases like ‘drinkable’. ‘This water is drinkable’ doesn’t mean ‘If you drink this water it will have been consumed’. (Of course, ability and the conditional are related in deep ways.) I will return briefly to the general problem of ability in Section 7.5, below. 9 See, for example, [Shoham, 1993]. 4 Page 6 early work in [Dowty, 1979] and the work that derives from it—concentrates on semantic problems of word formation, which of course is an important part of lexical semantics. 10 5. Formalizing common sense Due to the influence of John McCarthy, a group of common sense logicists has emerged within the logically minded members of the Artificial Intelligence Community. McCarthy’s views have been strongly and consistently expressed in a series of papers beginning in 1959. 11 The idea is that we will not know how to build algorithms that express intelligent behavior until we have an explicit theory of the core phenomena of intelligent thought; and the term ‘common sense’ is merely a way of indicating the phenomena in question. In practice, the research of the AI logicists is integrated with much less ambitious formalization tasks having to do with specialized sorts of reasoning such as planning and temporal reasoning. But formalizing common sense remains as an important high-level goal for most of us. To a certain extent, the motives of the common sense logicists overlap with Carnap’s reasons for the Aufbau. The idea is that the theoretical component of science is only part of the overall scientific project, which involves situating science in the world of experience to explain the reasoning that goes into the testing and application of theories; see [McCarthy, 1984] for explicit motivation of this sort. For extended projects in the formalization of common sense reasoning, see [Hobbs and Moore, 1988] and [Davis, 1991]. The project of developing a broadly successful logic-based account of semantic interre- lationships among the lexical items of a natural language is roughly comparable in scope with the project of developing a high-level theory of common sense knowledge. Linguists are mainly interested in explanations, and computer scientists are (ultimately, at any rate) interested in implementations. But for logicist computer scientists who have followed Mc- Carthy’s advice of seeking understanding before implementing, the immediate goals of the linguistic and AI projects are not that different. And—at the outset at least—the subject matter of the linguistic and the computational enterprise are remarkably similar. The linguistic research motivated by lexical decomposition beginning in [Dowty, 1979] and the computational research motivated largely by problems in planning (or practical reasoning) both lead naturally to a focus on the problems of repre- senting change, causal notions, and ability. 6. Formalizing nonmonotonic reasoning See [Ginsberg, 1987] for a good guide to the field of nonmonotonic reasoning and its early development. For subsequent developments, some good book-length treatments have become 10 This emphasis on compositionality in the interpretation of lexical items is similar to the policy that Montague advocated in syntax, and it has a similar effect of shifting attention from representing the content of individual lexical items to operators on types of contents. But this research program seems to require a much deeper investigation of “natural language metaphysics” or “common sense knowledge” than the syntactic program, and one can hope that it will build bridges between the more or less pure logic with which Montague worked and a system that may be more genuinely helpful in applications that involve representation of and reasoning with linguistic meaning. 11 See the papers collected in [Lifschitz, 1990]. 5 Page 7 available, including [Antoniou, 1997, Brewka et al., 1997, Schlechta, 1997]. Also see the relevant chapters of [Gabbay et al., 1994]. Among the available theories of defeasible reasoning that could be applied in lexical semantics, I find circumscription the most congenial to use in attempting to apply these theories to problems of natural language semantics, for the following reasons. – Circumscription is relatively conservative from a logical point of view. For instance, its language is simply the language of classical first-order or higher-order logic, and the local semantics of expressions—their satisfaction conditions in a model—are left unchanged. This makes it relatively easy to use circumscription as a development tool. – It is a straightforward matter to convert Montague’s formalism into a circumscriptive theory. – The more sophisticated versions of circumscription provide an explicit for- malism for dealing with abnormalities. 12 I believe that such a formalism is needed in the linguistic applications. This version of the paper is designed to be understandable without going into technical- ities. In particular, to understand the ideas behind circumscription, readers need only to know the following things. 1. A number of abnormality predicates are introduced into the language. 2. In defining logical consequence, attention is restricted to models in which the abnormalities are simultaneously minimized, while certain terms (the ones that are deemed independent of the abnormalities) are held constant, and certain other terms are allowed to vary. 3. This has the effect of taking only certain “preferred models” into account. A theory Γ circumscriptively implies a consequence A if A is true in all the preferred models of Γ. 4. These preferences can be constrained by an explicit “abnormality theory” using the predicates. 7. Thesis The following is an appropriate and illuminating logicist project. To use a nonmonotonic version of Montague’s Intensional Logic, combined with specialized domains dealing with eventuality types, plurals, and mass nouns, as the means of formalizing the logical relations between the meanings of semantically related words. I try to make a case for this idea by illustrating it with several case studies. This version of the paper will contain only one such study. But readers familiar with lexical semantics should be able to see that the techniques can readily be generalized to other cases. 12 See [Lifschitz, 1988]. 6 Page 8 8. Case studies The first case study (and the only one presented in this abbreviated version) has to do with words involving the suffix ‘able’. 8.1. The -able suffix The -able suffix illustrates a number of characteristics that challenge semantics. 1. There is variation in the meanings it assumes, but this variation is across a family of closely related shades of meaning. As usual in these cases, it is hard to tell whether to treat the variation by listing senses, by finding a single common meaning allowing for different uses, or by making the meaning context-dependent. 2. The meanings themselves are difficult to formalize. 3. These meanings seem to invoke references to concepts via relations of common-sense real world knowledge rather than linguistic knowledge. 4. There are exceptional patterns. 8.1.1. Sense 1 of able: the ability to perform actions The most usual pattern of Verb+able involves transitive verbs V that are broadly telic. Such verbs have three characteristics: they correspond to procedures that are in the normal repertoire of actions of human agents, there are normal or standard ways of initiating these actions, and there is a successful end state associated with the performance of the actions. In what I will call the paradigmatic case, the meaning of the derived adjectival form is that a thing normally will achieve the state s successfully when a test action associated with V is applied to it. The term ‘successful’ is deliberately used here to cover both the cases in which the state is really achieved, and in which the state is achieved without undesirable side effects. (This last condition, we can see, can shade into cases in which there are not only no undesirable side effects, but in which the state is worthy of being achieved.) Here are some examples illustrating this paradigmatic case. (Warning, some of these cases are ambiguous, and also fall under other cases.) 7 Page 9 acceptable dispensible observable adjustable doable openable admissible driveable provable adoptable expendable printable applicable expressible readable approachable fixable recognizable bearable flexible reusable believable formalizable reversible breakable imaginable solvable cleanable implementable TeXable communicable learnable trainable consumable liftable transferable defeatable loveable transportable defensible modifiable wearable detectable moveable withdrable 8.1.2. Carnap’s problem: defining ‘soluble’ The natural way to define ‘x is water-soluble’ is this. (8.2) If x were put in some water, then x would dissolve in the water. So at first glance, it may seem that the resources for carrying out the definition that Carnap found problematic will be available in a logic with a subjunctive conditional. We have had such logics, based on the apparatus of modal logic, since around 1970; see [Stalnaker and Thomason, 1970, Lewis, 1973]. But the fact that these conditionals conform to the rule of modus ponens makes them unsuitable for this purpose. Suppose it happens to be true that if one were to put this lump of salt in some water, it would be in this water—and this water is already saturated with salt. The fact that the lump would not then dissolve is no reason why this salt should count as not water-soluble. This and other such thought experiments indicate that what is wanted is not the tra- ditional subjunctive conditional, but a “conditional normality” of the sort that is used in deontic logics. Such conditionals can also be integrated into a nonmonotonic formalism. 13 In a circumscriptive framework, we do not introduce a special conditional, but formulate normalcy constraints using truth functional logic with abnormality predicates. Thus, (8.2) becomes something like this: (8.3) ∀x, y, t[[Water(y) ∧ Put-in(x, y, t) ∧ ¬Ab(x, y, t)] → Dissolve(x, y, t)]. The abnormality predicate in the antecedent of this formulation removes difficulties that arise from the defeasibility of the generalization captured by ‘soluble’. We can formulate a theory of abnormalities by explicitly adding an axiom to the effect that for all quantities x of salt, Ab(x, y, t) holds for any quantity of water y that is saturated with salt at t. We can 13 See [Boutilier, 1992], [Asher and Morreau, 1991]. 8 Page 10 add other conditions of this sort as they occur to us. 14 The fact that we are circumscribing the predicate Ab will make constraint (8.3) apply as a default in the nonexceptional cases. This repairs one problem in (8.2); (8.3) is able to deal with counterexamples relating to the defeasibility of the causal relationship indicated by this sense of -able. But there are two other problems from which (8.3) suffers. (i) Vacuous cases. It follows from (8.3) that a lump of iron that is never put in water is water-soluble. (ii) Delayed effects. According to (8.3), a lump of salt that is put in water at t will normally dissolve at t. This is never true. There is always a delay in the effect. But it would be hopeless to find a formula that would predict the delay. To my knowledge, Problem (i) was first noticed by Nelson Goodman in connection with the problem of conditionals (see [Goodman, 1947]). Problem (ii), which on reflection appears to be even more difficult, has hardly been mentioned in the philosophical literature. The first problem would have been solved by using a deontic conditional operator rather than circumscription, as follows. ∀x, y, t[[Water(y) ∧ Put-in(x, y, t)Ab(x, y, t)] →Dissolve(x, y, t)]. The conditional shifts attention to worlds and times at which x has successfully been put in water, and in which things go normally. With abnormality predicates, we need to ensure separately that everything that can be tested in the appropriate way is somehow tested in the appropriate way. We can do this by using a modal operator which ranges over the worlds obtainable by the performance of suitable test actions. This idea yields the following reformulation. (8.4) ∀x, y, t [[Water(y) ∧ Put-in(x, y, t) ∧ ¬Ab(x, y, t)] → Dissolve(x, y, t)]. Problem (ii) remains. The use of times in (8.2)–(8.4) is the source of the problem. Times are fine in reasoning domains where quantitative measurements of change are appropriate, but here they introduce a level of detail that is distracting. However, we do need some sort of universal quantifier in formulating these constraints— we wish to say that whenever x is put into water, x dissolves. We could begin by saying that this means that x dissolves in any case in which x is put in water. But this is either too vague for comfort, or it makes (8.4) false. We put a lump of salt in ordinary water—let’s call this a case. We can now see the salt dissolving. Is this the same case, or another? We wait, and now the salt is dissolved. Is this too a different case? The truth of (8.4), with t construed as a quantifier over cases, depends crucially on how we individuate cases in this example. But we have not very robust intuitions about cases, and in particular the term ‘case’ doesn’t provide any help about how we should perform this individation. 14 There is a division of labor here; these conditions belong to the abnormality theory. The abnormality theory is not part of the defintion of soluble, but it contributes to the adequacy of the definition. 9 Page 11 But the exercise we have just gone though make the clear that the “cases” we are consid- ering in this example are happenings—and we do have good intuitions about these. Thinking of the quantifier in (8.4) as ranging over happenings, or (to use a technical term) eventualities, we are able to make progress on Problem (ii). This is actually a sign that we are on the right track, since years of work in natural language semantics have made a very plausible case for the importance of eventualities and their structure in the ontology that is needed for semantics, and especially for the semantics of words. 15 Here, we are interested in eventualities that exhibit a typical structure; they consist of an inception, a body, and a culmination. The inception is usually an action, and may itself be an eventuality with this same three-part structure; in our example, this is putting something in a quantity of water. The body is usually a process, often one that can be measured in some way which tracks stages in which the culmination is reached. The culmination is a state; in our example, the state of the salt’s being dissolved. We will call such a three-part happening a telic eventuality. Our final definition of ‘water-soluble’ is obtained by using eventualities in place of times. 16 (8.5) ∀x[Water-Soluble(x) ↔ ∀e 1 ∀y[[Put-in(e 1 ) ∧ movee(e 1 ) = x ∧ container(e 1 ) = y ∧ Water(y)] → ∃e[Dissolving(e) ∧ dissolvee(e) = x ∧ medium(e) = y] ∧ ¬Ab(e)] → ∃e 2 [culmination(e) = e 2 ∧ Dissolved(e 2 ) ∧ disolvee(e 2 ) = x ∧ medium(e 2 ) = y]]. In words: x is water-soluble if and only if necessarily if an event e 1 of putting x in a quantity of water occurs then e 1 is the inception of a dissolving eventuality e involving the same x and quantity of water, which—unless something abnormal about e—will culminate in a state in which x is dissolved. This is not only a definition, but it appears to solve Carnap’s problem of defining ‘soluble’. Such definitions can be simplified considerably by refining the definition of a telic eventuality, and by appealing to general properties of these eventualities. The other common senses of X-able include: “If an appropriate test is performed then a result X that is not undesirable will normally be achieved” and “If an appropriate test is performed then a result X that is desirable will normally be achieved.” These cases are illustrated by drinkable and despicable. There are many more or less idiosyncratic cases that do not fit any of these patterns, such as palatable, comfortable, and reasonable. There are many suppletive cases that do not appear to be derived at all, such as capable and liable. We have to be prepared for such exceptions in lexical semantics. Those who know a little modal logic are likely to think that modal auxiliaries like can are formalized with the modal operator . This creates a pleasant analogy between the linguistic expression of universal and existential quantification on the one hand, and that of necessity and possibility on the other. 15 [Dowty, 1979] is one of the classic sources for this topic. 16 This formula uses more or less standard formalization techniques in event-centered semantics, where something like ∃e[Push(e) ∧ Past(e) ∧ Pusher(e) = Charlie ∧ Pushee(e) = Piano 43 ] is used to represent Charlie pushed the piano. 10 Page 12 The modal can and the suffix -able are not the same, but in one important respect they are both alike: they are more like causal conditionals than like possibility operators. This point is linked to another tradition that, like many of the ideas presented here, goes back to Aristotle’s account of change in the common sense world. For more on the modal issue, see [Cross, 1986]. 9. Conclusion I have said that this is part of a larger project. To get a sense of how the thesis articulated in Section 7 fares, it will be necessary to investigate a number of cases in considerable detail. I have developed partial studies of the following cases. 1. Some causal constructions. 2. Agency. 3. Some denominal verbs. 4. The -er of normal function, as in fastener. And I have begun a separate study of compound nominals (such as water meter cover ad- justment screw), which intersects with many of the issues described here. However the thesis itself fares, I want to recommend projects that seek to develop logical theories of word meaning to all formally-minded people interested in linguistic meaning. In the end, we will obtain a much better understanding of the common sense world and how it is reflected in language and reasoning through cooperative work that uses the best ideas of linguistics, computer science, and philosophy. I recommend this cooperative approach to anyone who is interested in projects of this kind. 11 Page 13 Bibliography [Antoniou, 1997] Grigoris Antoniou. Nonmonotonic Reasoning. The MIT Press, Cambridge, Massachusetts, 1997. [Asher and Morreau, 1991] Nicholas Asher and Michael Morreau. Commonsense entailment: a modal theory of nonmonotonic reasoning. In J. Mylopoulos and R. Reiter, editors, Proceedings of the Twelfth International Joint Conference on Artificial Intelligence, pages 387–392, Los Altos, California, 1991. Morgan Kaufmann. [Boutilier, 1992] Craig Boutilier. Conditional logics for default reasoning and belief revision. Technical Report KRR–TR–92–1, Computer Science Department, University of Toronto, Toronto, Ontario, 1992. [Brewka et al., 1997] Gerhard Brewka, J¨ urgen Dix, and Kurt Konolige. Nonmonotonic Rea- soning: An Overview. CSLI Publications, Stanford, 1997. [Carnap, 1928] Rudolph Carnap. Der logische Aufbau der Welt. Weltkreis-Verlag, Berlin- Schlactensee, 1928. [Carnap, 1936 1937] Rudolph Carnap. Testability and meaning. Philosophy of Science, 3 and 4:419–471 and 1–40, 1936–1937. [Carnap, 1956] Rudolph Carnap. Meaning and Necessity. Chicago University Press, Chicago, 2 edition, 1956. (First edition published in 1947.). [Church, 1951] Alonzo Church. The need for abstract entities in semantic analysis. Proceed- ings of the American Academy of Arts and Sciences, 80:100–112, 1951. [Cross, 1986] Charles B. Cross. ‘Can’ and the logic of ability. Philosophical Studies, 50:53–64, 1986. [Davis, 1991] Ernest Davis. Common Sense Reasoning. Morgan Kaufmann, San Francisco, 1991. [Dowty, 1979] David R. Dowty. Word Meaning in Montague Grammar. D. Reidel Publishing Co., Dordrecht, Holland, 1979. [Gabbay et al., 1994] Dov Gabbay, Christopher Hogger, and J.A. Robinson, editors. Hand- book of Logic in Artificial Intelligence and Logic Programming, Volume 2: Nonmonotonic Reasoning. Oxford University Press, Oxford, 1994. [Ginsberg, 1987] Matthew L. Ginsberg, editor. Readings in Nonmonotonic Reasoning. Mor- gan Kaufmann, Los Altos, California, 1987. (Out of print.). 12 Page 14 [Goodman, 1947] Nelson Goodman. The problem of counterfactual conditionals. The Jour- nal of Philosophy, 44:113–118, 1947. [Goodman, 1963] Nelson Goodman. The significance of der logische aufbau der welt. In Paul Schilpp, editor, The Philosophy of Rudolph Carnap, pages 545–558. Open Court, LaSalle, Illinois, 1963. [Hobbs and Moore, 1988] Jerry R. Hobbs and Robert C. Moore, editors. Formal Theories of the Commonsense World. Ablex Publishing Corporation, Norwood, New Jersey, 1988. [Lewis, 1973] David K. Lewis. Counterfactuals. Harvard University Press, Cambridge, Mas- sachusetts, 1973. [Lifschitz, 1988] Vladimir Lifschitz. Circumscriptive theories: A logic-based framework for knowledge representation. Journal of Philosophical Logic, 17(3):391–441, 1988. [Lifschitz, 1990] Vladimir Lifschitz, editor. Formalizing Common Sense: Papers by John McCarthy. Ablex Publishing Corporation, Norwood, New Jersey, 1990. [Link, 1983] Godehard Link. The logical analysis of plurals and mass terms: A lattice- theoretical approach. In Rainer B¨ auerle, Christoph Schwarze, and Arnim von Stechow, editors, Meaning, Use, and Interpretation of Language, pages 302–323. Walter de Gruyter, Berlin, 1983. [McCarthy, 1984] John McCarthy. Some expert systems need common sense. In H. Pagels, editor, Computer Culture: the Scientific, Intellectual and Social Impact of the Computer, volume 426 of Annals of The New York Academy of Sciences, pages 129–137. The New York Academy of Sciences, 1984. [Montague, 1962] Richard Montague. Deterministic theories. In Decisions, Values, and Groups, volume 2, pages 325–370. Pergamon Press, Oxford, 1962. Reprinted in Formal Philosophy, by Richard Montague, Yale University Press, New Haven, CT, 1974, pp. 303– 359. [Montague, 1969] Richard Montague. On the nature of certain philosophical entities. The Monist, 53:159–194, 1969. [Montague, 1973] Richard Montague. The proper treatment of quantification in ordinary English. In Jaakko Hintikka, editor, Approaches to Natural Language: Proceedings of the 1970 Stanford Workshop on Grammar and Semantics, pages 221–242. D. Reidel Publishing Co., Dordrecht, Holland, 1973. Reprinted in Formal Philosophy, by Richard Montague, Yale University Press, New Haven, CT, 1974, pp. 247–270. [Schlechta, 1997] Karl Schlechta. Nonmonotonic Logics. Springer-Verlag, Berlin, 1997. [Shoham, 1993] Yoav Shoham. Agent oriented programming. Artificial Intelligence, 60(1):51–92, 1993. 13 Page 15 [Stalnaker and Thomason, 1970] Robert C. Stalnaker and Richmond H. Thomason. A se- mantic analysis of conditional logic. Theoria, 36:23–42, 1970. [Steedman, 1998] Mark Steedman. The productions of time. Un- published manuscript, University of Edinburgh. Available from http://www.cogsci.ed.ac.uk/˜steedman/papers.html., 1998. [Thomason, 1991] Richmond Thomason. Logicism, artificial intelligence, and common sense: John McCarthy’s program in philosophical perspective. In Vladimir Lifschitz, editor, Artificial Intelligence and Mathematical Theory of Computation, pages 449–466. Academic Press, San Diego, 1991. 14 This is the html version of the file http://www.cnl.salk.edu/~tony/ptrsl.pdf. G o o g l e automatically generates html versions of documents as we crawl the web. To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:K6k2oGffoggJ:www.cnl.salk.edu/~tony/ptrsl.pdf+%22artificial+intelligence%22+%22common+sense%22+site:edu+pdf&hl=en&client=firefox-a Google is not affiliated with the authors of this page nor responsible for its content. These search terms have been highlighted: artificial intelligence common sense These terms only appear in links pointing to this page: pdf Page 1 Levels and loops: the future of arti®cial intelligence and neuroscience Anthony J. Bell Interval Research Corporation, 1801 Page Mill Road, Palo Alto, CA 94304, USA In discussing arti¢cial intelligence and neuroscience, I will focus on two themes. The ¢rst is the univers- ality of cycles (or loops): sets of variables that a¡ect each other in such a way that any feed-forward account of causality and control, while informative, is misleading. The second theme is based around the observation that a computer is an intrinsically dualistic entity, with its physical set-up designed so as not to interfere with its logical set-up, which executes the computa- tion. The brain is di¡erent. When analysed empirically at several di¡erent levels (cellular, molecular), it appears that there is no satisfactory way to separate a physical brain model (or algorithm, or representa- tion), from a physical implementational substrate. When program and implementation are inseparable and thus interfere with each other, a dualistic point-of-view is impossible. Forced by empiricism into a monistic perspective, the brain^mind appears as neither embodied by or embedded in physical reality, but rather as identical to physical reality. This perspective has implications for the future of science and society. I will approach these from a negative point-of-view, by critiquing some of our millennial culture's popular projected futures. Keywords: arti¢cial intelligence; neuroscience; cyclic systems; dualism; science ¢ction 1. INTRODUCTION In this paper I will survey the recent history, current status and future prospects of arti¢cial intelligence (AI) and neuroscience. I will attempt to relate the social moti- vations and potential impact of the ¢elds concerned on society at large. 2. THE SCIENCE FICTION FUTURE Formalities over, and given that the Millennium is a signi¢cant enough social phenomenon that it colours popular impressions of the future of science, it is worth looking at what impressions a person of the year 2000 might have formed from late twentieth century popular science books, science ¢ction books and ¢lms, and even from the science pages of newspapers. Such a person might be forgiven for thinking that the future will be something like this. Nano-robots will perform all molecular repairs in our bodies, making us e¡ectively immortal. Highly engineered drugs, perhaps the descendants of Prozac and Ecstasy, will take care of emotional disorders, as a side-e¡ect solving all social problems, so everyone will be happy (¢nally). That's for the nostalgic minority who cling to living in the primitive biological form. More cyber-aware indi- viduals will have downloaded themselves into the `Net'and will exist like a William Gibson character in a global computer network which is capable of providing all pro- tagonists with the most fantastic entertainment. Many global problems will be solved with the demographic move to the `Net', problems such as population, food, transporta- tion and energy. The `Net-heads' will have been passed on the way by the `Worldbots', digital mechanical life-forms which will ¢rst ease human life by performing all mundane tasks, but will shortly after become so much more intelligent than the unenhanced us that they will practically become `spiritual machines', which may or may not use sel¢sh altruism to decide to be benign towards the human animals, and if we are lucky, they will continue to serve us, something like digital Bodhisattvas. Back in the cyberworld, boundaries between individuals will break down, and transhuman life-forms will appear, analogously to the emergence of multicellular life in the ocean. Implanted into robot spaceships, these life-forms will lumber into space like the ¢rst amphibious ¢sh lumbered onto the land. A long time after this, perhaps after a few galactic wars (in which the `Dark Side' may be brie£y £irted with but not joined forever), the universe will be one huge Internet, matter everywhere drawn into the process of computational living. The extremum of this is called the Omega Point. (A ¢nal twist is that since the Omega Point does not join the Dark Side, again possibly using game theoretic reasoning, it will decide to be benign and resurrect everyone who ever lived and give them what they most desire. This is called theJudeo-Christian heaven byTipler (1995). Other references used in constructing this version of future history are Gibson (1986), Moravec (1990) and Kurzweil (1999).) These amazing developments are the almost inevitable consequences of the merging of the digital and the organic worlds, on the threshold of which we are now standing. Cellphones and laptop computers are only the beginning. We might call this future the bio-informational age, in keeping with its millennial timing, and the smoothness with which it mixes in with elements of NewAge philosophy. Phil. Trans. R. Soc. Lond. B (1999) 354, 2013^2020 2013 & 1999 The Royal Society Page 2 3. THE CURRENT JOB OF SCIENCE It's a giddy picture indeed, but how much of it, if any, will come true? If none of it is going to happen, it would be very helpful if science could tell us why, so that we could get on with living our real future. The di¤culty for science is that the prospect of a bio- informational future, with its cyborg, transpersonal themes causes us to ask questions concerning individu- ality, consciousness, mind and machine, exactly those questions which science has had least success in framing. AI and neuroscience are the ¢elds that come closest in engineering and biology to framing such questions. Scratch the surface of many AI researchers and neuro- scientists (perhaps quite vigorously) and you may ¢nd someone who started o¡ by asking `What are we?' The answers to this question are not that numerous. Either we are machines, in which case AI should be possible and neuroscience should be able to work out the algorithm (or algorithms) that the brain is running, or we are something else, in which case both projects will fail in their ultimate goals, which is not to say they will not achieve great things along the way. (One of the great things that they might achieve is an exact picture of their own limits.) Either way, by examining the history and current state of AI and neuroscience and by identifying the issues beneath the surface of these ¢elds, we may gather some sense of what are the important themes playing along science's internal frontier (disregarding for now how di¡erent this frontier looks from outside). 4. HISTORY AND STATE OF ARTIFICIAL INTELLIGENCE AI's ultimate purpose is to build a robot that lives in the world with a computer for a brain. It therefore assumes that the essence of the living and/or thinking process can be captured in digital computation. The ¢rst attempts to produce AI in the 1960s involved writing facts and rules into the machine using various quasi-logical languages. In the 1980s this became less popular. Rule-based systems were seen as non-robust: they could not adapt well to small changes in circum- stances. Also, every fact had to be programmed in by a human. This led people to think that real-numbered, `subsymbolic' systems were needed, and these systems had to be able to learn facts (or learn something) themselves, just by observing data. Historically, this view carried within it the cybernetics view of the 1950s. It was one short step from this shift to statistical theories. The short step was called neural networks (Haykin 1999); it started in 1984 (Rummelhart & McClelland 1986) and it is not over yet. An inter- disciplinary ¢eld with a higher than average tolerance for speculation and free-wheeling enquiry, neural networks were popular with students and military funders, and often regarded with frustration by other disciplines that shared a border. As the ¢eld became more rigorous, it re- established its connections with mainstream AI, through common interests in statistical machine learning. Tech- nically speaking, the ¢eld of neural networks is content- less. The empirical side is neuroscience; the theoretical side is statistics and signal processing. This is perhaps what makes it such a great ¢eld to work in. Symbolic AI was thus subverted by a shift to statistical learning theories. It was also subverted in two other directions by the emergence of the ¢elds of arti¢cial life (Langton 1997) and behaviour-based robotics (Arkin 1998) (or situated agents). Arti¢cial life (or alife) is subsymbolic in that it implicitly assumes that intelligence is just the complexend of a simulatable life process. A living system and its environment are typically simulated together, often using genetic algorithms and population dynamics to simulate evolution. Behaviour-based robotics attempts bravely to deal with the perceptual-motor loop of a robot in a real environ- ment, rejecting both the alife simulated worlds and the mainstream AI notion of a representation of the world. Echoing Gibson (1979) in his famous debate with Marr (1982) (Bruce & Green 1990), the `agents'-literature focuses on complexbehaviour coming from simple mechanisms operating in tight coupling with a complex environment, in contrast to Marr's emphasis on the feed- forward computation of a representation from sensory data. Alife and behaviour-based robotics lack a structural foundation such as that given to neural networks and statistical machine learning by mathematics. This makes it hard to judge progress or assess methodology in these ¢elds. However, on the other side, neural networks that learn both sensory perceptions and motor actions in an environment are extremely rare, and for a good reason: it is di¤cult to build a statistical model of an environment when the system's perceptions are transformed into actions that a¡ect the statistics of the input. Furthermore, what should such an acting system do? There is an obvious goal for a feed-forward perceptual system: build a probability distribution of what happens. The hidden symmetries (dependencies, redundancies) in this distribution are the hidden structure of the world. But in this cyclic case, when the world is at least partly constructed by the actions of the system, the shape of this distribution is action dependentöthe system gets to partly choose what symmetries exist, and the notion of a hidden set of privileged symmetries is under threat. This is post-modernism for statisticians. At this point, most people would abandon informa- tional, or unsupervised, goals and appeal to one of the many speci¢c goals which a robot system might have, such as to ¢nd food or recharge the batteries. While these are no doubt important, they do have an air of arbitrari- ness about them that makes us uneasy: we are familiar enough with the £uxof goals in our personal experiences to desire something more invariant to underly action selection. 5. QUESTIONS CURRENTLY LATENT IN ARTIFICIAL INTELLIGENCE Here we have identi¢ed two questions which lie beneath the surface of the pluralistic AI of today. The ¢rst question, to rephrase, asks why we do not have a mathematical theory of the perception^action cycle. Of course there is work on active perception, on sensory^motor coordinate systems, and engineering 2014 A. J. Bell Levels and loops: the future ofarti¢cial intelligence and neuroscience Phil. Trans. R. Soc. Lond. B (1999) Page 3 department robotics is full of mathematics. But the kind of theory I mean is one that is as universally useful for characterizing cyclic systems as Shannon's information theory is for characterizing communications channels, i.e. feed-forward systems). (Incidentally, maximizing the channel capacity involves ¢nding those hidden symme- tries we mentioned that exist in the probability distribu- tion of the input. This forms the basic goal of my own favoured area of neural networksöunsupervised learning (Hinton & Sejnowski 1999).) Implicit in this is the second question. What would we want such a post-Shannon system to do? What quantity should a perception^action cycle system maximize, as a feed-forward channel might maximize its capacity? A third question was directed at AI researchers by Penrose (1989), and by the hostility and controversy it caused, you knew he had hit a weak spot in AI. Penrose wondered if the fact that the physical substrate of the world, of which relativity and quantum mechanics are our best accounts, might be su¤ciently di¡erent from the digital substrate of computers that it would render AI impossible. Is there something in the quantum that is necessary for mind? Sco¤ng AI-philosophers characterized Penrose's pos- ition as `we don't understand quantum mechanics and we don't understand consciousness, so they must be the same thing'. The derision increased when Penrose, to make his hypothesis more speci¢c, proposed, with Stuart Hamero¡, that quantum consciousness manifests itself through coherent quantum e¡ects in a network of proteins called microtubules which form the structural skeleton of neurons (and other cells). Critics, distracted by the strangeness of these speci¢c proposals (which are not crucial to his argument), may miss the validity of Penrose's general doubt about the computer: that it is a particularly unusual artifact, being deterministic, discrete time and discrete state. The whole state of the machine at the digital level may be written down. No natural objects seem to be of this nature. The computer is really a physical instantiation of a model. We know a model can compute, but can it live or think? Functionalism (the philosophy of AI) was based on using the computer metaphor for mind, arguing that the brain was the hardware implementation of the `mental program'. But Penrose's arguments were really designed to raise doubts about this separation of physical and mental processes. Could the brain be separated from a supposedly ¢nitely describable mental process running on it? Since Rene¨ Descartes, the conceptual separation has been there in our language, but is it scienti¢cally really there? Either there is a physical level at which the separation can be performed (analogous to the level of logic gates in computers) or functionalists have to admit that the brain is not a machine. But the failure to detect a `logic gate level' halfway up the brain's reductionist hierarchy may not be the end of the argument for the functionalist, who could still argue that if there is a computer at the bottom, AI would be possible, at the very least with a computer with the resources of the universe. The `universe-as- computer' is a popular fringe-topic in physics, lying behind an e¡ort to ¢nd a ¢nite discrete process such as a cellular automaton that might underly the known laws of physics. But until someone succeeds in showing this, we might be wiser to stick with R. F. Feynman, who noted that quantum processes are not in general simulatable, even by Turing machines (and who in the process gave rise to the mysterious and unformed ¢eld known today as quantum computing). The luck (or skill) of scientists is that sometimes they do not have to philosophize to ¢nd the answer. They can ask questions of Nature directly. So perhaps this is a good point to survey the history and current state of neuro- science, because this is the discipline whose empirical project is exactly the ¢nite description of brain processes. 6. HISTORY AND STATE OF NEUROSCIENCE The early landmarks in post-war neuroscience were the Nobel prize winning work of Hubel & Wiesel (1968) for their studies of the receptive ¢elds of monkey visual cortical cells, and that of Hodgkin & Huxley (1952) for their uncovering of the mechanism and mathematics of spiking in neurons. It has grown into a huge ¢eld with the annual Society of Neurosciences meeting in the USA attracting 30 000 people. The two early Nobel prizes re£ect perhaps a natural split in the ¢eld between those working above or below the level of the cell. Many of the great successes of the 1970s and 1980s were at the subcellular level, as the mol- ecular biology revolution progressed, and as a result this part of neurobiology was highly empirical and essentially continuous with mainstream cellular, molecular and developmental biology. In this period, the molecular basis of neural signalling, both in spiking and synaptic transmission was uncovered. A bewildering array of ion channels, neurotransmitters and neuromodulators were found to be engaged in the processes of sculpting neural response properties and controlling communication between neurons. From the chemistry of photon absorption by photoreceptors, to the chemistry of muscle contraction, the nervous system apparently performed an astonishingly complicated and coordinated series of molecular actions not qualitatively di¡erent from those in other living cells, but somehow in the brain this molecular dance constituted percept, thought and action. At and above the level of the spiking neuron, things were slightly di¡erent. Lacking the formal structural basis of molecular biology, neuron-level neuroscience focused on the spike trains as signals representing neural information. The discreteness of the spike as an information-carrying unit was matched in biology only by the genetic code. This led to early attempts to charac- terize the `neural code', attempts that were revived by Bialek and co-workers in the 1990s (Rieke et al. 1997). (Notably, inevitably, these e¡orts attempt to characterize neurons as feed-forward information channels.) Behind these e¡orts is a faith in the neuron level, certainly as a useful descriptive level, but also as a `computing level' which molecular and biophysical processes exist to implement. Does the goop that we see in the electron micrographs merely exist to implement `the spiking computer'? This is the neuroscience analogue of the func- tionalist debate in AI, and I will return to it in ½ 7(c), after addressing the issue of cycles in neuroscience. Levels and loops: the future ofarti¢cial intelligence and neuroscience A. J. Bell 2015 Phil. Trans. R. Soc. Lond. B (1999) Page 4 7. QUESTIONS CURRENTLY LATENT IN NEUROSCIENCE (a) Cycles in neuroscience The same problem with cycles presents itself in neuroscience as in AI, but whereas the primary cycle of concern in AI was the perception^action cycle, in neuroscience, the cycles are everywhere. It is interesting that the clearest stories in neuroscience are those which at ¢rst glance most closely resemble feed- forward systems. One example is the synapse. The spike arrives at the presynaptic bouton, causing vesicles of neurotransmitter to be released, which in turn cause ion channels in the postsynaptic site to open and change the postsynaptic electrical potential. Another example is the early visual system, starting with the retina and moving through thalamus into early visual cortex. The treatment of this system as a feed-forward channel, despite massive corticothalamic and corticocortical feedback, has enabled information theoretic learning models the modest success of producing qualitatively correct predictions for the form of the static (Bell & Sejnowski 1997) and dynamic (Van Hateren & Van der Schaaf 1998) cortical receptive ¢elds that were ¢rst observed by Hubel & Wiesel (1968). However, feed-forward processing in the nervous system is the exception rather than the rule, and often what looks feed-forward contains complicated feedback systems at a di¡erent level of analysis. For example, the spikes of a cortical neuron have now been seen to extend far into the dendritic tree, a¡ecting, through voltage- dependent channels, the integration of signals from synapses. This destroys the illusion that the neuron works like a directional `neural network' neuron, performing a weighted sum of its input signals. Even in the synapse and the retina there are feedbacks. Although the (human) retina receives no neural inputs from the brain, the brain controls gaze direction which determines what the retina sees. Although neurotrans- mitter does not travel backwards across synapses in most neurons, many other molecular signals do, as the exten- sive and controversial attempts to ¢nd synaptic Hebbian learning mechanisms in long-term potentiation have revealed. In abstract, the lack of a theory of cycles in biology can be seen by considering an experiment in which some vari- able X is changed and some other variable Y is moni- tored. What is published are the relatively rare cases where some correlation in X and Y is observed. The temptation then is to say that `X controls Y' and from this to build a model of feed-forward neural information processing (or if X is a chemical, we may market it as a drug to control Y). In nature, things happen di¡erently from in the experi- ment. X may rise, causing Y to rise, but then increased Y usually causes X to diminish, directly or through some other variables Z. These cycles of positive and negative feedback are universal in biology and cause equilibrium values of X and Y, or stereotypical dynamic behaviour to occur. A neural spike is one example of a transient dynamic caused by positive and negative feedback, where X is the sodium current and Y the potassium current. Slipping into the language of probability theory, if we desire to discover the relationship in nature, of X and Y, we may measure their joint probability distribution p(X, Y), and we could do so by observing X and Y under normal operating conditions, observing a peak in the distribution at equilibrium, and some trajectories corre- sponding to the stereotypical dynamics of the variables. But in trying to estimate whether X controls Y, ex peri- ments often take the form of measuring the conditional distribution p(YjX) and constructing the joint distribu- tion through the formula p(X, Y) ˆ p(YjX)p(X). This latter strategy gives the wrong answer for p(X, Y) because (i) rather than the system controlling p(X), we are controlling it, thus cutting the system at X, and (ii) we have, through our choice of independent and depen- dent variables, imposed on the system a direction (X ! Y) of dependency, with an implied direction of causality that does not exist in nature. There is no doubt that such experiments can still be useful in teasing out dynamic cyclic behaviour. The kinetics of ion channels can be identi¢ed with the aid of voltage and current clamping techniques, but there is a recognition in such experiments that the clamped cell is a frozen picture of the true process. This recognition often seems to go missing as the feedback loops get wider (`out of sight, out of mind') and particularly as biology becomes technology. Examples that spring to mind are the widespread prescription of drugs that combat depres- sion by controlling seratonin levels, or attempts to control ecosystems by introducing new species, or, for that matter, the attempt to tailor many aspects of a plant's genetic make-up to ¢t an industrial model of agriculture. Anyone seriously studying or modelling metabolism or ecosystems knows the extent to which they are dealing with cycles, but somehow, when the results reach into the area of medicine or its macroscopic equivalent `planet management', the causal, feed-forward style of thinking is what is presented, particularly to the news media and commercial interests. Anything which does not ¢t the feed-forward model is linguistically demoted to the status of a `side-e¡ect', to be eliminated if possible. But side- e¡ects are nature's way of telling the scientist that all processes are cyclic. (b) Interlude: biology's master control node I cannot resist, at this point, discussing the role of biology's master control node, the genome. Although it is somewhat o¡ the subject of AI and neuroscience, arguments pointing back to the genome as the causal factor behind animal behaviour and intelligence are so universal in our culture, that to allow the genome special status outside feedback cycles would be to endorse a control-node mysticism rivalled in shape and form only by that of the monotheistic Anglican bishops who debated so famously with T. H. Huxley. (When science became a greater authority on human origins than the church, the transition hid the fact that it was a change of government without a change in policy. Furthermore, a¡ording the genome special status allows the present-day church of evolutionary psychology to rampage unchecked and, in my opinion, the wrong lessons are then drawn from biology.) The genome's grand cycle with other genomes, mediated through populations of phenotypes is the king of all biological feedback loops. It is a trans-individual 2016 A. J. Bell Levels and loops: the future ofarti¢cial intelligence and neuroscience Phil. Trans. R. Soc. Lond. B (1999) Page 5 molecular regulation loop, qualitatively similar to those occurring within cells, with cooperation (or symbiosis; Margulis & Sagan 1995) corresponding to the positive feedback loop and competition for resources corre- sponding to the negative feedback loop. Neo-Darwinists, stuck on the negative pole, like to interpret cooperative behaviour as `sel¢sh' altruism (I'll scratch your back if you scratch mine). The inverse position, on the positive pole, is to interpret competition for resources as sel£ess greediness (I'll eat you, but honestly, this is not about me). You might consider both positions absurd, or you might use the latter point of view as an antidote to the dominance of the former in our culture. The point here is that competition and cooperation have equal status and the process of `natural selection' in which we are judged by an external environment (more biblical parallels) is better viewed as a complexmolecular regulation loop like any other. The regulation loop is mediated through phenotypic success, which brings up another loop-denying habit of neo-Darwinists, which is to see the genome as a controller for all aspects of the phenotype, right down to its speci¢c behaviour: DNA as the determining code for an organism. There must be a particular attraction in this idea for certain authors, because they take great pleasure in outraging people's common sense by portraying organ- isms as the helpless puppets of their genes (Dawkins 1990). I will not duplicate the e¡ort of the many authors who have attacked the social or behavioural versions of this notion (for example, the preposterous notion that there could be a gene for homelessness, which was actually considered in an editorial in Science), because this would be to attack it at its weakest point. I'd like to attack the notion in its strongest version: the molecular. The central dogma of molecular biology is that `genes make proteins, and not the other way round'. The central dogma of molecular biology is wrong! Sequences of DNA code for strings of amino acidsö trueöbut how these amino acids are assembled into functioning proteins and which parts of the DNA are read in the ¢rst place are both controlled by proteins, and depend on the state of the cell and its type. It's as if there was a bookish town (a cell) with a central library (the genome) and people (proteins) who came in to read short sections here and there, share with each other what they had read, and use the knowledge to build and change the town. Who is controlling hereöthe townsfolk or the library? (Answer: neither.) Where did the people in the town come from? If `genes make proteins', then the library made them, but the truth is that they were there all along. The functioning networks of enzymes that set to work on your DNA when you were conceived were already in place in the salty water of your mother's egg cell. They were just the latest instalment in a continuous epigenetic lineage that stretches back to your primordial metabolic ancestor, a droplet of seawater that accidentally got stuck inside a lipid membrane with a fortuitous set of amino acids. It is harder to make more unsubstantiated assertions in biology than in the area known as `origin of life'. But if the `genes makes proteins' debate really comes down to whether there was RNA (code) before proteins (metabolism) or proteins before RNA in the ¢rst proto- cells (De Duve 1991), then two factors should be con- sidered: (i) amino-acid chains form much more readily than nucleic-acid chains, and (ii) it is more likely that the ¢rst people wrote the ¢rst books, than that the ¢rst books wrote the ¢rst people. (It is noteworthy that both neo- Darwinists and New Testament theologians believe that `in the beginning was the word (logos)'.) Of course, now it is claimed there were ribozymes (RNA with the ability to catalyse reactions), but was this metabolism evolving a code, or a code evolving metabolism? The outcome of this debate is not crucial. The intent here is merely to weaken the notion of DNA as a kind of controller of the phenotype. An equally valid (and equally invalid) perspective has the phenotype choosing what is read from the gene and what is done with it. In reality, the organism and its genes are caught in a cyclic dynamic, and if the organism decides to spend its after- noon in a (real) library, instead of attempting to father children, then you can be sure that the pattern of gene expression will alter accordingly. This argument ¢ts with our ¢rst general theme of criti- quing feed-forward thinking in AI and neuroscience. (c) Levels in neuroscience Returning now to the second theme we touched on when discussing AI, ½ 5 ended with a consideration of levels of a system and functionalism. There was a chal- lenge to the functionalist to empirically investigate the brain and identify a level at which the brain could be ¢nitely `written down', a level analogous to logic gates in computers. The obvious candidate is the neuron level. If we wrote down the sequence of all spikes of all neurons, would that be enough to specify the `neural computation'? Do molecular and biophysical processes exist to imple- ment a `spiking computer' at the neuron level? I believe the answer to these questions is no. While no speci¢c physical processes below the gate-level of a computer interfere with the model-like operation of the computer (unless something goes wrong), this cannot be said at the neuron level of the brain. Molecular and biophysical processes control the sensitivity of neurons to incoming spikes (both synaptic e¤ciency and post- synaptic responsivity), the excitability of the neuron to produce spikes, the patterns of spikes it can produce and the likelihood of new synapses forming (dynamic rewiring), to list only four of the most obvious inter- ferences from the subneural level. Furthermore, trans- neural volume e¡ects such as local electric ¢elds and the transmembrane di¡usion of nitric oxide have been seen to in£uence, respectively, coherent neural ¢ring, and the delivery of energy (blood £ow) to cells, the latter of which directly correlates with neural activity. The list could go on. I believe that anyone who seriously studies neuromodulators, ion channels or synaptic mechanism and is honest, would have to reject the neuron level as a separate computing level, even while ¢nding it to be a useful descriptive level. Perhaps a physicist or a neural-network theorist, in looking for an easy theory, would still argue that the molecular level is mere implementational detail, but in most cases this is more a result of prejudice, supported by laziness and ignorance. If the molecular level is unimportant for an Levels and loops: the future ofarti¢cial intelligence and neuroscience A. J. Bell 2017 Phil. Trans. R. Soc. Lond. B (1999) Page 6 organism's behaviour, then how is a prokaryotic bacteria, vastly simpler than a neuron, able to navigate, eat and avoid toxins, all without the bene¢t of a nervous system? If the neuron level is no good, are there any other candidate levels? Several have been proposed. The theory of neuronal groups, or cell assemblies, was another early candidate. The apparent `noisiness' of individual spike trains could be smoothed out by integrating over groups of neurons coding, say, a given visual stimulus. The mean- ingful unit of perception was seen to be the activity of the group. In my view this idea contains a common error: failure to appreciate that noisiness is in the eye of the beholder, in this case the experimenter. In the case where a stimulus is presented and that part of the neural response which does not correlate with the stimulus is regarded as noise, we have a situation almost as bad as thinking French people are stupid because they produce strange noises in response to questioning. What about the molecular level? Say we write down how many of each type of molecule are in each cell. Can this capture the computation of the cell? Unfortunately not, because the location of the molecules are important. Testing of enzyme reactions in bulk phase (solutions in test-tubes) is partly responsible for an impression that in the cell, molecules largely jitter around with Brownian motion and sometimes bump into each other and react. What turns out to be more likely is that most reactions take place locally in membrane-associated protein complexes, and the product of one reaction is passed directly on as substrate for the next. Evidence for this detailed spatial organization, called metabolic channel- ling, is accumulating (Ovadi 1995). Rather than being unreliable and `wet', much of cellular biochemistry may already operate in what has been called the machine phase (although of course, in this paper I am arguing that `machine', is the wrong word), where intricately detailed and coordinated reactions occur, not in the bulk phase. It seems that nanotechnology already exists, except that it is not technology in the normal sense in which a ¢nite model is implemented using some parti- cular substrate level. It is di¤cult to imagine human engi- neers making more e¤cient or complexprocesses by top- down manipulation of individual atoms. We have reached the level of individual molecules, and the functionalist might say, no doubt through gritted teeth, that he is happy to write down the position of all the molecules in a brain. This will still be a ¢nite descrip- tion. If there is no evidence of submolecular interferences, we could have a `molecular machine' to satisfy the func- tionalist. Remember that at this molecular level, we are looking for something as clean as a logic gate, which is a device responding deterministically to its logical inputs, and which is insensitive to the motions of individual elec- trons. At this level, things become more controversial. Mol- ecular computing is actually an area of advanced engi- neering research, so though it is not clear that it always falls within the discrete-state Turing model of computa- tion, it might seem harder to dismiss the notion that molecules compute in nature. If we use molecules to construct Turing-style computing devices, then, like good functionalists, we will have molecular computers. But what molecules do in nature may be di¡erent. In fact, it is. There are sub- molecular interferences that violate the separateness of the `molecular machine' level, and they are quantum e¡ects. Two examples of this are electron transfer in photosynthesis and the energetics of enzyme interactions (Welch 1986). In both cases, quantum coherences are necessary to explain the e¤ciency of the reactions. But we don't even need to go as far down as quantum e¡ects, because proteins do not end at the edges of the black and red balls of which ball-and-stick molecular models are constructed. Their electrical ¢elds extend into the surrounding water molecules, orientating them to form what is called structured water. Structured water is also important in determining how enzyme reactions occur, and how ion channels are selective to certain ions. To argue that one piece of structured water or one quantum coherence is a necessary detail in the functional description of the brain would clearly be ludicrous. But if, in every cell, molecules derive systematic functionality from these submolecular processes, if these processes are used all the time, all over the brain, to re£ect, record and propagate spatio-temporal correlations of molecular £uc- tuations, to enhance or diminish the probabilities and speci¢cities of reactions, then we have a situation qualita- tively di¡erent from the logic gate. The variables lying beneath the level of a molecular `gate' can a¡ect the beha- viour of the gate, so the functionalist is again frustrated, and the notion of the brain as a molecular `computer' can be viewed as no more than an analogy, and an inaccurate one. To say these things is not to be a `New Age quantum mystic'. It is to attempt to clearly state empirical obser- vations about molecular biology and to use them to attack the prevalent tendency to view biological organ- isms as machines in the exact technical sense in which computers are machines, i.e. in the sense that they are physical instantiations of ¢nite models which do not permit physical interactions beneath the level of their machine parts (e.g. the logic gate) to in£uence their functionality. It is a big leap from this argument to quantum consciousness. There is no evidence that large-scale macroscopic quantum coherences, such as those in super- £uids and superconductors, occur in the brain. That some people like to make the quantum consciousness leap is testament more to the compelling connections between the mathematics of quantum mechanics and a holistic non-mechanistic world-view in which mind is immanent (Bohm 1980), than to any speci¢c biological evidence. But as the ¢rst scienti¢c workshops on `quantum biology' meet, there is a good chance that a fascinating area of theoretical and experimental research will come about, and that more evidence will accumulate to suggest that functionalism cannot be used as a theory of the processes occuring in organisms. 8. RESTATEMENT OF THE ARGUMENT In discussing AI and neuroscience, I have focused on two themes. The ¢rst is the universality of cycles, in other words of sets of variables that a¡ect each other in such a way that any feed-forward account of causality and control is misleading. 2018 A. J. Bell Levels and loops: the future ofarti¢cial intelligence and neuroscience Phil. Trans. R. Soc. Lond. B (1999) Page 7 The second theme is based around the observation that a computer is an intrinsically dualistic entity, with its physical set-up designed not to interfere with its logical set-up, which executes the computation. In empirical investigation, we ¢nd that the brain is not a dualistic entity. Computer and program may be two, but mind and brain are one. The brain is thus not a machine, meaning it is not a ¢nite model (or computer) instantiated physically in such a way that the physical instantiation does not inter- fere with the execution of the model (or program). 9. THE BIO-INFORMATIONAL AGE REVISITED What do these arguments say about the future, about science and society and their relationships? Will the cyber-dream take place, or should we quit AI and neuroscience and join a hippie commune? The technical conclusions on this seem to me to be as follows. There will be no nanotechnological robots running around inside our bodies, at least none that are any more wizardly than the non-machine-like molecular complexes that already exist. There will be no `control node' drugs that can pin us on the right end of the sadness^happiness spectrum, and thankfully we can drop this one- dimensional view of the human emotions. There will be no people living without brains, as digital patterns in the Internet. There will be no spiritual machines, models so advanced that they can deduce things that we ¢nd mysterious. There will be no machines with minds. Cyborgs seem more plausible. The extension of human capacity through technology is already familiar to us, and it is a small step from driving a car to operating remote or tissue-embedded robot limbs. The process of building new models and surrounding ourselves with them will not be abolished in a return to some idealized pretechnological state that never existed. Models will merely be put in their place. So if most of these things are not going to happen, where does society's focus on robots, virtual reality and the `wired world' dream, come from? I believe it is a psychological reaction to the increasing proliferation of models around us. When social interactions become codi- ¢ed instead of open-ended, when people ¢nd themselves in roles as producers and consumers in a vast social machine, then the fantasy of the cyborg has already come true. When I enter an air-conditioned building in which the windows are all sealed and the lighting is all £uores- cent, I am walking into a model, a virtual reality. But the more our behaviour becomes machine-like, generated by and interpreted through the models that we and others construct, the more we will feel disconnected from the level below (and above) the models. We will be less able to see that we are not machines, and that there is no separating level at the logic gate that holds us above our physical substrate, and no control nodes in our brain that enable us to look down on reality. We are in the middle of it. I think this is a lesson that science is teaching us. If this lesson were truly to percolate into our culture from our science, and not be perceived by science as `the threat of irrationality', then we would suddenly ¢nd ourselves living in a di¡erent world. This is why I am ultimately optimistic about prospects for AI and neuroscience, despite my negative predictions about the success of their ultimate goals. I. Newton's mechanistic world-view took a blow with the arrival of quantum physics, but almost a century later, we still have physicists. Physics, it turns out, does not need to be tied to mechanism (in the strict sense we have used in this paper, quantum mechanics is non-mechanical), and neither does biology. Computer science, mathematics, probability theory: these are more tied up with the building of ¢nite models, but they too have an intriguing role to play, for along the border of the set of all models lurk paradoxand inconsis- tency, the `universal solvents' (to use D. C. Dennett's phrase in a situation where it applies) that dissolve models. This is very interesting territory, ¢rst explored by K. Go«del, who showed, remarkably, that there are true things that can be said within a consistent model which the model itself cannot prove. But interesting half- dissolved models can be built along the frontier, models that give paradoxthe respect it deserves. Quantum physics is one such model. After all, paradoxis not just something to be obliterated at ¢rst sight, or ignored. Rather, it is an information structure which tells us exactly the shape and form of the failure of a model. (Ex falso quodlibet is what logicians say to express their obser- vation that in Boolean logic, from `true and not-true', anything is provable. But if this was the end of the story, then how could a Zen koan be useful, how could it be about anything? In fact there are a whole array of non- Boolean logics and paraconsistent logics. Some are even used in AI, re£ecting the fact that when people are asked `Do you like Bill Clinton?' many of them want to say `I don't know' (underdetermined) and `I love him and hate him at the same time' (overdetermined).) Paradoxinforms us about the failure of a model in a qualitatively di¡erent way than Bayesian theory tells us that the observed and the estimated distribution of some variable are di¡erent. This suggests to me that there is something below probability theory, which, because the Cox^Jaynes formalism of Bayesian probability theory is founded on Boolean logic, may well be reachable by generalizing logical structures to incorporate answers other than yes and no. These speculations, together with the empirical argu- ments I have made in the rest of this paper, suggest that there is a very exciting role for AI and neuroscience to play in the next century. As G.-C. Rota, a mathematician and an advocate of Husserl, Heidegger and Wittgenstein, wrote, Even in our days of constantly predicted revolutions, it is di¤cult not to be led to an optimistic conclusion. The new sciences of the computer and the brain will validate the philosophers' theories. But what is more important, they will achieve a goal that philosophy has been unable to attain. They will deal the death-stroke to the age-old prejudices that have beset the concept of mind. (Rota 1990, p.107) AI and neuroscience are exactly placed where the deaths of dualism and feed-forward thinking are sched- uled to take place. If these disciplines choose to partici- pate in this shift, rather than cling to concepts that are not empirically supported, then there will be many inter- esting PhD theses to write. Levels and loops: the future ofarti¢cial intelligence and neuroscience A. J. Bell 2019 Phil. Trans. R. Soc. Lond. B (1999) Page 8 Finally, so far I have left out one question: Will there be a transhuman age? For this there is a strong biological precedent in the two major steps in biological evolution. The ¢rst, the incorporation into eukaryotic bacteria of prokaryotic symbiotes, and the second, the emergence of multicellular life-forms from colonies of eukaryotes. Hegel had a word, sublation, for the harmonic incor- poration of components into a whole without destruction of their individual nature, and we are all familiar with the good feeling that comes from playing in a team. However, those who followed up on G. W. F. Hegel's visions helped construct the nightmarish machine-like political state of mid-century fascism, so we are right to feel nervous about any superorganism with a hierarchical (i.e. feed-forward, controllable) structure. Thankfully, unlike twentieth century broadcast media, the Internet provides a good, non-hierarchical model for future infor- mation £ow and social creativity. It is not risking too much to predict that it will continue to be a profound stimulus for social change. Will this lead, ultimately, to some form of transhuman phase transition in the coming centuries? I believe that something like this may happen, and that science (and technology in some form, as with the Internet) will play a part in this. But I believe that at least part of this devel- opment will be a return to the past, a re-enchantment, to a vision of life that does not view humans or their minds as outside nature. Both our nostalgia for the past and our millennial fascination with a global cyber-reawakening are symptoms of the fact that we in the western world currently live in the most individualistic culture in human history. Our transhuman imagined science-¢ction future may be, at base, a projection which contains the diagnosis of the present, as Jung might have observed. Just like our private dreams, our public dreams are not to be taken literally.They are symbolic and indicative of imbal- ances in the present. The relieving news is that in correcting these imbalances, we will create a future which is not as as alien as the science-¢ction future seems. In fact, it might look as familiar to us as something which we had forgotten. REFERENCES Arkin, R. C. 1998 Behavior-based robotics (intelligent robots and autonomous agents). Cambridge, MA: MIT Press. Bell, A. J. & Sejnowski, T. J. 1997 The independent components of natural scenes are edge ¢lters.Vision Res. 37, 3327^3338. Bohm, D. 1980 Wholeness and the implicate order. London: Routledge and Kegan Paul. Bruce, V. & Green, P. 1990 Visual perception: physiology, psychology, and ecology, 2nd edn. Hillsdale, NJ: Lawrence Erlbaum Associates. Dawkins, R. 1990 The sel¢sh gene. Oxford University Press. De Duve, C. 1991 Blueprint for a cell. London: Portland Press. Gibson, J. J. 1979 The Ecological Approach to Visual Perception. Boston, MA: Houghton Mi¥in. Gibson, W. 1986 Neuromancer. Phantasia Press. Haykin, S. S. 1999 Neural networks: a comprehensive foundation, 2nd edn. New Jersey: Prentice-Hall. Hinton, G. E. & Sejnowski, T. J. 1999 Unsupervised learning: foundations of neural computation. Cambridge, MA: MIT Press. Hodgkin, A. L. & Huxley, A. F. 1952 A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117, 500^544. Hubel, D. H. & Wiesel, T. N. 1968 Receptive ¢elds and func- tional architecture of monkey striate cortex. J. Physiol. 195, 215^244. Kurzweil, R. 1999 The age of spiritual machines: when computers exceed human intelligence. New York: Viking Press. Langton, C. G. 1997 Arti¢cial life: an overview. Cambridge, MA: Bradford Books, MIT Press. Margulis, L. & Sagan, D. 1995 What is life? London: Weidenfeld and Nicolson. Marr, D. 1982 Vision. NewYork: Freeman. Moravec, H. 1990 Mind children: the future of robot and human intel- ligence. Cambridge, MA: Harvard University Press. Ovadi, J. 1995 Cell architecture and metabolic channeling. Austin, TX: Landes; NewYork: Springer. Penrose, R. 1989 The emperor's new mind. Oxford University Press. Rieke, F., Warland, D., de Ruyter van Steveninck, R. & Bialek, W. 1997 Spikes: exploring the neural code. Cambridge, MA: MIT Press. Rota, G.-C. (ed.) 1997 Philosophy and computer science. In Indiscrete thoughts, pp. 104^107. Boston, MA: Birkha«user. Rumelhart, D. E. & McClelland, J. L. 1986 Parallel distributed processing: exploration in the microstructure of cognition. Cambridge, MA: MIT Press. Tipler, F. J. 1995 The physics of immortality: modern cosmology, God and the resurrection of the dead. New York: Doubleday. Van Hateren, J. H. & Van der Schaaf, A. 1998 Independent component ¢lters of natural images compared with simple cells in primary visual cortex. Proc. R. Soc. Lond. B 265, 359^ 366. Welch, G. R. (ed.) 1986 The £uctuating enzyme. Nonequilibrium problems in the physical sciences and biology, vol. 5. New York: Wiley. 2020 A. J. Bell Levels and loops: the future ofarti¢cial intelligence and neuroscience Phil. Trans. R. Soc. Lond. B (1999)