This article was originally circulated around 1980.
Researchers in artificial intelligence have been working for years, and feverishly recently, on programming and theorizing on the mechanisms necessary for intelligent behavior in restricted domains, especially domains typically the realm of specialists and not laymen. One hope is that the structure of the mechanisms produced will generalize into something that is uniformly acceptable for all ranges of tasks.
One of the goals of artificial intelligence (AI) is the creation - theoretical or actual - of a computer individual, a large program or machine that has some of the qualities of a human mind and encompassing nearly the range of abilties of a normal, average person. The first obvious statement about this goal is that it represents an immense body of machinery.
Rather than looking for the uniform structure within the areas of interest, perhaps the proper place to look for this uniformity is within the organization of the entire system, which means that the individual solutions to problems of intelligence are only constrained to work within their domain, with responsibility for relating various solutions left up to the overlying organization.
Furthermore, a system which is very large is difficult to program, and it is especially difficult for some group of people to maintain the organization in a reasonable manner over the lifetime of that system.
In this paper the rationale for organization as a topic of concern to researchers in AI is presented; much of the rationale will be in the form of informal arguments rather than precise proofs. In addition, a methodology for self-organization and a new view on programming will be introduced. Details of this organization and programming style will not be presented, but the interested reader is advised to read my dissertation, if you can find it. for an implementation of this organization and an application in the domain of natural language generation.
The driving force behind the ideas presented herein is the fluid domain, which will be introduced shortly.
One of the prevailing notions in all scientific endeavor is that simplicity is to be favored over complexity where there is a choice. When there is no choice and there is a complex alternative, of course the complex alternative must be chosen. In AI there seems to be an attitude of searching for the simple, uniform solution at the expense of not considering any complex alternatives. However, one must be wary of searching for simplicity in the wrong places. If there is an extremely large program one can expect that it may be the product of a diverse, uncoordinated effort. Therefore the individual parts may not show any uniformity from one to the other; on the other hand, the conglomeration of the pieces may demonstrate a simplicity through a uniform organizing principle which is prima facie ad hoc.
AI is the wrong place to seek simplicity throughout.
Years ago, Herbert Simon gave us the parable of the ant on the beach, in which the complex behavior of the ant as it traverses the sand is viewed simply as the complexity of the environment reflecting in the actions of the ant. He says:
An ant, viewed as a behaving system, is quite simple. The apparent complexity of behavior over time is largely a reflection of the complexity of the environment in which it finds itself.
He bravely goes on to substitute man for ant in the above quote and attempts to make sense of that statement. The task of creating an intelligent machine has evolved historically from the initial sense of simple programs demonstrating interesting behavior even though these programs are simple. This sense comes, I think, from the speed of the machine in executing the steps in our program; even though the machine does not really deliberate over the process, it proceeds with such haste that a large number of paths can be explored easily.
I remember years ago I first watched my first computer busily simulating a circuit, and I was fascinated as I changed parameters and observed the behavior of the system change in beautifully complex ways. My thought was: how easily the machine could be programmed to explore many different ways of dealing with my problems, and with great speed show me all the various alternatives and their outcomes.
So I, too, fell prey to the hypnotic effect of the computer as it hurriedly went about its business. And I, too, fell prey to the belief that simple principles could lead to complex and appropriate behavior.
Years later, disciplines of computer science arose which studied algorithms, soon considered the ultimate form of programming: one develops or dreams up an algorithm to solve some problem. Algorithms are short, clever things that magically produce the answer after a number of complex calculations. Even now, in 1981, people talk of the art or science of programming, programming methodology, and the psychology of programming, while all the time referring to dreaming up algorithms for problem solutions.
It is almost as if the notion of a batch (versus a timeshared) system were viewed as the paradigmatic framework for computing, and in this framework the programmer hesitantly submits his program to the monolithic device and anxiously awaits the detailed results at the nether end.
But algorithms are uninteresting objects from my point of view. They reflect the attitude, if not the fact, that one starts with a good idea of how one wants something to be done, or at the least, certainly, what one wants to be done. Writers in the field talk about 'predicate transformations' and input/output specifications. The main thrust of programming is, in this view, to get a program to take some objects and produce some others, usually with the program starting with one and ending with another. So we talk about a program to sort numbers, or to invert a matrix.
But in this paper we treat a program as a thing, loosely speaking, that demonstrates some behavior we are interested in. Normally these programs never halt (or shouldn't) and to write them is to be in a symbiotic relationship with them - mutual learning. They exist in a world and interact with it continually, and to not do so is to be logically dead; and it is through this interaction only that we are capable of observing the mindlike qualities of the machine in question.
In 'The Lives of a Cell', Lewis Thomas [Thomas 1974] says, when discussing the variety of life that goes on in each cell:
My cells are no longer the pure line entities I was raised with; they are ecosystems more complex than Jamaica Bay.
He later goes on to compare the cell as an entity to the earth, not the comparison we would have thought even a short while ago.
Each cell, then, is incredibly complex, and our brains are composed of very large numbers of them, connected in complex ways; and as we attempt to learn of the interconnections between them, we only find that we are having more difficulty than anticipated.
If we posit (and almost by definition of our research we must) that the brain is equivalent to the mind, then we are facing an object with overwhelming complexity and parallelism. It does not seem likely that a small (100 million word) program with a single processor is going to be able to demonstrate a substantial proportion of the behavior that a typical mind does.
NOTE: In deference to the mind-body argument I have chosen the word 'equivalent' in this context to mean that there is nothing other than the brain as the object which is responsible for all of the phenomena we care to think about as belonging to the mind. Some of these phenomena may be epi-phenomena of the activities of the brain, but all behavior is traceable to the brain under this wording.
Living entities co-evolve with each other and with their environment, and to think of creating an entity without a world that is matched to it is unrealistic. And if we fail to make truly intelligent machines, it will be because we have taken the world as we think it is and have tried to make a mind that fits it. In cases where the programs we have written exist in a world made with them (operating systems, text editors, interactive programming languages) the programs exist marvelously, yet when we work with the natural world, the results are pitiful.
The artificial part of artificial intelligence is the world that already exists, not the programs we try to write. A natural world for any behaving system is one that has co-evolved with it; if the world is provided in any other way, it is an unnatural world for the program (or individual) that will need to interact with that world. In order to match the computer to the world, we will have to produce - write or build - an intermediate layer or level of processing between the world and the computer, and this is exactly what vision, robotics, and speech/hearing research is about.
So I come to the point of saying that, even though I am constrained to leave the world as it is, the program I write must be complex to demonstrate the complexity of behavior it must; the complexity of the brain is the result of natural co-evolution with a complex world. Think of how sad we would be if the key part of our minds were reduced to an algorithm that Dijkstra could prove correct, that runs in n log n time, and that shows correct lexical nesting!
No program that is interesting from the point of view of behaving like a mind can be other than extraordinarily complex.
In my view, AI has been suffering from approaching the wrong problems from the wrong direction for a number of years, perhaps since its inception. Progress is often good in some limited areas, but if the problem is to build real mindful objects (as opposed to intelligent objects), I think a different approach is needed.
Some people believe that machines should be able to plan in much greater detail than humans, and so machines are not 'intelligent' unless they do plan better. This notion seems to come from the 'puzzle' tradition of artificial intelligence. In this tradition the paradigmatic behavior of an intelligent system is that it is a great puzzle solver: many things that robots will have to face are like puzzles, so a good thing for them to be able to do is solve puzzles. This has led, naturally, to the question: how can knowledge be best represented to facilitate puzzle solving? The 'solution' is to represent facts as objects within the program. 'Moving around' these objects within the representation then corresponds in a strong way to 'moving around' objects in the real world.
A typical example from the problem-solving literature is the missionaries and cannibals problem. This puzzle, discussed in [Ernst 1969] and [Jackson 1974] has recently re-surfaced as a key example [McCarthy 1980]. Representing the objects in this puzzle is usually either done with state descriptors or with first order logic. The crucial fact of the matter is that this puzzle has a definite solution. Moreover, the solution can be recognized. Even further, it is usually possible to determine whether progress is being made towards the solution.
Thus, when life is viewed as a series of puzzles, it is natural to program representation systems to try to get things into shape for the puzzle solving race which all intelligence is believed to be.
Puzzle situations are much like traditional board games or brain-teasers. It is not surprising to witness that early researchers worked on board games (chess, checkers) as a paradigm of intelligence. Others worked on brain-teasers. These situations are amenable to formal methods since a concise statement of the relevant facts is easily statable in such languages.
The problem is that these types of problems are 'good' to work on because they can be simply stated with only the essential facts represented and an answer can be recognized, both by machine and by human observers.
In real situations, on the other hand, it is often difficult or impossible to formulate a problem in terms of a single goal or a small set of simple goals. Moreover, even in those cases where such a formulation is possible, there may not be measures of progress that ensure a solution to the problem as stated. Systems based on the measurement of progress towards a single goal, then, are sometimes at a loss in real situations.
This is the most important section in the presentation since if anyone has a quarrel with what I am doing, that quarrel will almost always resolve itself into one about the basic point of the research. Many papers and theses on AI talk about the technical details at length and leave the reader to speculate about where on the philosophical spectrum the researcher lies.
The history of AI is often a sequence of bold attempts with moderate successes followed by a severe retrenching. In the early days there was a flurry of activity based on largely ad hoc techniques with the faith that pure raw power would be able to achieve interesting results. This was predicated, as noted earlier, on the attitude that complex behavior resulted from a simple system in a complex environment. Some examples are Evans' analogy program, Guzman's scene analyzer, and the entire mechanical translation project. Though these programs were complex and performed well in their domains, they generally embodied a simple theory of the situation attacked. In the analogy program there were the hope that simple descriptions and simple differences between such descriptions would lead to analogical reasoning; in Guzman's program, a simple set of descriptors - labels on lines in line drawings - and a syntactic operation on those labels and lines would be able to interpret many scenes; and in the mechanical translation work, simple transliteration techniques and syntactic analysis were hoped to result in an expert translation system.
There were some successes at this time, but there was a barrier that soon loomed ahead of the researchers. The barrier was that the simplicity assumption was challenged at the point of knowledge or semantics. The apparent problem was that the programs did not have good knowledge about the domain that was being explored. The early programs relied on formal actions in a syntactic world. Ad hoc syntactic-based theories became unrealistic for larger problems.
This realization came during the late sixties or early seventies and signalled the beginning of the knowledge-based or what I call the epistemological era. What really was happening was that the term 'artificial intelligence' was taken seriously, and the line of research became: produce an intelligent program. 'Intelligence' was possibly intended by some to refer to the difference between men and animals, and the research of these people was therefore not particularly oriented towards higher intellect. But for most, Simon and McCarthy for example, producing an 'intelligent artifact' meant producing machines that would perform well in intellectual skills - formal reasoning, puzzle solving, games (like chess), and the like. The emphasis of this line of research was on the intelligence part of artificial intelligence.
Note: By epistemological era I want to emphasize that I mean that period when the knowledge that our programs had about the world was viewed as being the critical problem. We started talking about knowledge-based system, and the word 'semantics' became crucial to the discussions of the day.
The research of this era focussed on inventing representations of knowledge and writing programs that would operate on these representations to solve problems in, generally, toy worlds.
A toy world is not necessarily a world with toys in it, such as the simple blocks world, but a world in which the scope of the domain is limited in strict ways. So, medical diagnosis as treated by most AI research is a toy world, although it is an important one.
Of course, what most people refer to as 'intelligence' is involved closely with the idea of education or experience, though this is strictly untrue by definition; but some people are lured to confuse intelligent with intellectual, though not in a malicious way. Many of the tasks chosen as targets of research require a lot of experience, at least, with the world, and this experience is simply 'knowledge'.
Knowledge takes a long time to acquire for people, and parents are aware that it takes many weeks before any real type of awareness and knowledge of the world appears in their children. Infants spend many hours attempting to do what adults consider trivial, and what researchers spend years trying to get primitive manipulators and cameras to do. That the knowledge barrier was approached is not surprising given the difference in processing power between humans and computers, and given the difference in time that is put into each 'programming' task. A human brain, viewed as simply a processing machine, is several orders of magnitude more powerful than a PDP-10 [Moravec 1977], and 'evolution' coupled with months of day and night work by infants represents very much more 'programming' effort than the paltry man-years put in by programmers in research centers.
A great deal of progress towards producing programs that perform at even expert levels in some intellectual skills (for instance, medical diagnosis) has been made, yet there is still something wrong with the situation, and this problem is roughly centered around tolerance, creativity, judgment, adaptability, brittleness, common sense, etc. Programs that behave in the rarified atmosphere of intellect ignore the underpinnings of creativity, exploratory behavior, and the ability to think about one's own abilities which forms the foundation of intelligence.
We are still in the middle line of research which is epistemology-based or knowledge-based, and much time is spent struggling to gather knowledge about parts of the world that are interesting and relevant to a task to be performed. But, there is a subtle type of restriction that can be observed in such systems.
Consider an advanced knowledge-based system that reasons in some (even technical) domain. The research that Feigenbaum et al [Feigenbaum 1980], [Feigenbaum 1971], [Buchanan 1969], does is a good example, and many of the programs of that research group are highly impressive from a performance point of view. In many cases a great deal of time is spent gathering reasoning protocols and detailed knowledge of the domain. Typical results and reasoning chains are pieced together, and a good representation of those chains and that knowledge is devised. Then an inference engine is produced that, given some problem, produces an expert result of some sort.
The danger here is that the structure of the knowledge might very well be like the more sophisticated backpacking tents available today. These tents have an elaborate structure, but to compensate for the general lack of problem-solving ability of some backpackers, the mechanisms are cleverly put together so that even if the tent is thrown to the ground, it will nearly assemble itself. Thus, in the case of expert systems, the success is really a product of the clever nature of the representation of the knowledge in the system, which will produce a good solution from a kick by the user.
But is the system too sensitive to the representation? The answer lies in the lack of extensibility of some of the systems, where the extensibility lies not in the difficulty of adding new knowledge of the same type as already is used in such a system, but in the difficulty of telling the system things in a different representation.
We can view the entire process as one of island hopping in the familiar analogy to planning: that is, many systems attempt to plot a course from one island to another by a sequence of intermediate islands which are accessible from each other. In the epistemological system, the choice of islands is radically limited by the type of knowledge that the hops from island to island represent.
One other way to think about the island analogy is by considering the types of interfaces that occur between islands. In any sort of modular system there are objects and there is an interface between these objects. This interface represents the communication that goes on between the two modules or islands. In many systems that use an island-hopping style of planning, the conditions that exist after an island is processed, along with the requirements of the next island, exactly match the abilities of the system. The planning problem, then, is difficult because the exactness of interface must be made the case at each step rather than left open.
NOTE: Similarly, in the case of modular programming, the the information that is passed is forced through a narrow bottleneck - normal parameter passing protocols - which are derived from mathematical notations and register and/or stack requirements in the physical computer. This bottleneck has the effect of enforcing a strict matchup between modules and thence resulting in difficulty programming the system.
We find that the epistemological system has as a motive engine a program that can match these nearly exact objects. This is a problem because, though there is a wide variety of knowledge represented in these systems (so-called knowledge-based systems), there is little flexibility in the system to apply poorly matched knowledge to a situation.
In an ontological system we find that the interfaces have only a resemblence to one another, and the job of matching them is much harder and may involve some risks. The structural coupling of the system is reduced and powerful (and unguarantee-able) procedures must be employed.
NOTE: Very few of these systems exist: I claim Yh is one, and FOL [Weyhrauch 1978], [Weyhrauch 1974], [Filman 1976] is another. By ontology I mean the quality of existence or being; in particular, I mean being able to interact with the world, to apply imprecise methods to everyday problems, and otherwise to exist in a flexible manner.
An interesting analogy can be made that shows the traditional task of AI to seem somewhat bizarre. It is often the case that programs are written with the intention of being 'expert systems', which means systems able to perform at expert human levels. Now, no one expects that these systems are expert in any other domain than that for which they was designed. Often very intricate deductions are made with carefully chosen methods; the power of these systems comes from the highly distilled nature of the problems they actually work on, and the structural coupling within the domain is sufficient to see them through many stormy occasions. But these systems lack any knowledge outside their areas of expertise, and, in fact, lack even simple common sense. They are often unable to communicate very well even about their areas of expertise. In short, they are best compared to idiot savants, who can do some things surprisingly well, but little else. Think of trying to bring up a child to only be able to diagnose orthopedic gait problems from the trunk down, and NOTHING ELSE.
One would think that there is some overall 'problem' that needs to be solved in order to admit machines to the ranks of the cognizant. This problem is the ontological problem as over against the epistemological problem, which has been extensively attacked in the past. Challenging the ontological problem means facing squarely the problems of creativity, uncertainty, hedging, influencing, and, not the least, being.
My own feeling is that there is an underlying sort of behavior and organization for an 'intelligent' system such that the intellectual skills we hope to give to such a system are naturally imposed or constructed on top of this underpinning. That is, developmentally, the ability to interact with a flexible world in a flexible way preceeds the ability to interact with an inflexible world, which is one way to characterize intellectual skills.
So we again begin to worry about the underlying processes that the system can experience. Early researchers worried about this too, but not as a response to the knowledge problem.
The philosophical next step taken in this overall research program is to wonder about the wisdom of accentuating the 'intelligence' in artificial intelligence. So, instead of thinking about how to program a computer to do intelligent things, we think about how to program the computer to behave in such a way that we are led to believe it has a mind.
This distinction is reflected in the terminology mindful behavior. Whereas before we talked about the knowledge of the system, we talk now about the mechanisms of being' and we find right away that a lack of terms for talking about this aspect of existence makes us sound like wide-eyed madmen. Suffice it to say that the transition from epistemology to ontology is being attempted.
The main thrust, then, of the research behind these opinions, was to explore ways of building systems (very large systems) that are capable of exhibiting mindlike behavior and accept different types of experience in order to change
A mindfully intelligent object, whether human, animal, or artifact, is one that can endure many situations and even thrive within some of them. Don't forget, even horses are more intelligent on the whole than any program yet devised. A horse can cleverly survive in a real world, yet even the most sophisticated program of any kind only survives for years at most in a very restricted situation before intervention is necessary. One class of such programs is operating systems. But even they interact in a small domain or culture of programmers and they can respond well to only a small number of relatively predictable events. Sooner or later some little detail goes wrong and the program's limited ability to make adjustments forces some expert programmer to intercede. Horses do much better.
Take the example of an intelligent natural language generation system. Such a system would, ideally, be able to take anything that could be represented within the system and produce an utterance that is related to it in a strong way. The utterance would reflect a good attempt by the system to express the represented situation. There are several ways to achieve this goal, but all of them depend on overthrowing any thoughts of producing the 'correct' utterance for the represented situation.
Any system has some set of abilities, usually represented in a non-uniform framework. In order to accomplish its designated tasks the system needs to discover the appropriate capabilities and use them. In a typical program (as opposed to a system) its abilities can be easily discovered with no effort at all: they are implicit in the function calls that are apparent in the program. For example, if a program needs to compute the factorial of some integer, the code, (FACT n), represents the knowledge of who the expert is in this case. On the other hand, though locating the information about abilities is trivial, when new abilities are added to such a program, many places where this information is explicitily represented need to be updated.
NOTE: When I refer to a system I generally mean a large computer program that is able to do more than one specific task. Thus, a program that sorts an array of numbers is not a system, but a text editor is, even though its specification looks so simple. A text editor has a number of operations it can do, and it interacts with a user in order to perform a loosely described task. A sorting program can sort the elements of some data structure when the data structure is properly presented to the program.
Part of the reason that programs must do poorly, at least now, is that the range of things that they can be taught is quite limited. By this I don't mean the quality so much as the quantity. We have expected to be able to take a small domain, like automatic programming, about which, it would seem, almost everything is known, and program up something which will act like a professional in that area, yet be totally ignorant of everything else.
To produce anything more than this we would have to solve and integrate solutions to a great number of problems, including many algorithmic problems in vision, hearing, and other 'low level' tasks where a clever technique with a narrow range of applicability is acceptable. Accomplishing this is necessary in order to couple our machines with the world better than has been done before; and this will mean building input and output devices that perform these activities, and possibly in an algorithmic way.
So for the moment, since we cannot hope to reproduce anything like human breadth of expertise, we must content ourselves with how to produce in a small measure that which is most mindlike in a restricted domain.
Consider a team of specialists. If a problem they are working on is not entirely within any of their specialities, then one would expect that they could solve it after a dialogue. This dialogue would interchange information about strategies, techniques, and knowledge as well as information about what each knows, why it might be important, why some non-obvious course of action might be appropriate. In addition, one would expect bickering, propaganda, lies and cheating to also occur as each strives to further his own viewpoints.
NOTE: One possible scenario for emerging intelligence is to provide a computer with a very large range of abilities and with descriptions of these behaviors so that the program can reason about itself. One strategy for allowing the system to reach a steady state in which its behavior is well attuned to the world is to let it 'evolve' in the Darwinian sense, meaning that individuals in this system must either be selected to survive because of useful traits or to die due to uselessness. While exploring this idea, it may become advantageous for some of the descriptions of individuals to be less than honest about what can be accomplished. However, this is only a speculation at the moment based on the hint that co-operation may not be as effective as conflict.
The first of these categories of information (the co-operative things) appear to be the essence of an intelligent 'system' of individuals. The 'intelligence' of the system looks as though it emerges not so much from the individual, though this is certainly important, but from the interactions it has. One would think that a lot of efficiency is lost in one of these interactions because of the 'natural language' that is used: If a more streamlined language could be used then the intelligence of the system would increase.
First let's think about a standard programming language. In a formal programming language one often sets up a conversation between a team of experts, namely the 'subroutines', 'functions', or whatever they are called. Each member of the team 'knows' about the experts of interest to it: 'knows' in that whenever such an interaction is called for with an expert on some topic, the appropriate expert is contacted. The method of conversation is quite formal and efficient. Let us think of LISP. Here one places the 'important items' in specified locations. These locations represent the semantics of what the program is trying to say. One might think of it as saying: Here is the number that I want you to factor for me. Saying this is accomplished by putting a number in the 'number to factor' position that the factoring expert has set up. The factoring expert is contacted, finds the message, performs whatever it wants and puts the 'answer' in the here's the answer you ordered, sir slot that everyone has agreed on.
NOTE: Of course, there are a number of programming languages which are non-standard, and which do not suffer from the same problems that I am complaining about here. Three examples are micro-planner [Hewitt 1972], conniver [Sussman 1972], and prolog [Kowalski 1974] However, these programs are not used outside of AI applications, and 'standard' programming languages and the style of thinking that goes along with them are being taught nearly universally.
In some sense, the factoring expert doesn't really know who it is talking to since that information is hidden from view. The conversation is between an expert and an anonymous caller!
These types of conversations are what the entire computation of the large program is like since each individual subroutine is nothing more than a sequence of these conversations with some ordering performed internally (control structure).
Now, if some new expert comes on the scene who claims to factor better and faster than the old expert, how would the user of the factoring expert get the information that there is a new subroutine in town? Well, usually the answer is that the user knows the name of the expert or else its location. We could try to just pull a fast one on the identity of the experts, leaving their names and/or locations the same or else we could stop the world and rebuild it.
In living systems and societies we do not usually change names or identities that often. And when new information comes along it is spread because the language we use is rich enough to accomodate such things. Imagine a programming language that could have a richer conversation along these lines. It is not that the human in a society has a much richer set of methods for finding out what he needs to know, but that the methods of communicating what each person knows are richer. A human who wants to find out how to do something simply looks it up or looks in some standard places for who might know. The difference is that the human can recognize in more different ways that someone else is claiming to know something of interest. Conversely, a newcomer can easily advertise his or her abilities and knowledge with the confidence that it will be generally understood.
Natural languages are, thus, very much unlike artificial ones, which have a strict semantics and usually a rigid syntax. A programming language or system that allowed such conversations to go on could be much easier to program and would be generally more robust, though possibly slower, although one can imagine compilers that would freeze the state of a world if that is what is needed. Alternatively, the structure of programming languages has often been designed to fit the architecture of the underlying computer, and so it might be that there is an architecture that doesn't naturally channel programming languages into traditional molds.
The programming language would then allow routines to discuss abilities, make deals, trade information, and do other interesting things. In fact, one might want to explore what the advantages and disadvantages would be in the creativity (say) of a system that allowed the routines to lie, cheat, steal, etc.
The main feature of a system written in this kind of programming language is that it would be robust and able to respond better to more situations than most programs do now. This ability should not be underestimated.
I want to make a distinction between two of the kinds of domains that one can work with in AI research: fluid domains and essential domains. These qualifiers are meant to refer to the richness of these domains, and the implied behavior we expect to find in these domains.
In an essential domain, there are very few objects and/or classes of objects and operations. A problem is given in terms of this domain and must be solved by manipulating objects using the operations available. Generally speaking, there exist no more than the number of operators minimally needed to solve the problem, and usually a clever solution is required to get at the right result.
A typical essential domain is the missionaries and cannibals problem. In this problem there are three missionaries, three cannibals, a boat, a river, and the proviso that the six people need to get across the river, and if the cannibals ever outnumber the missionaries, the result is dinner for the cannibals.
Now the problem with this is that it is assumed this is taken as some formal situation, and things that people would really do, such as looking for a bridge or thinking about some people swimming, is not allowed. This is a hard problem for most people to solve, and yet these same people can speak English; AI yearns to solve the former in order to get at the latter, it seems.
Other essential domains resemble puzzles and simple games; often mathematics looks like this.
An important feature of these situations is that it takes great cleverness or intelligence in the classic sense to solve them; they are called essential because everything that is not essential is pruned away and we are left with a distilled situation.
In a fluid domain, there are a large number of objects and/or classes of objects and a large number of applicable operations. Typically these situations are the result of a long and complex chain of events. Generally there are a lot of plausible looking alternatives available, and many courses of action can result in a satisfactory result. Problems posed in this type of domain are open-ended and don't have a very recognizable goal.
A typical fluid domain is natural language generation. In such a domain there are a large number of ways of doing various things, beginning with inventing phraseology out of whole cloth, and progressing towards very idiomatic statements. There are many ways to start the process, many ways to proceed once started, and many different ways to decide that one has completed the task. A feature of such a domain is that judgment is much more important than cleverness, and recognizing a situation as familiar is the crux, and there is little need for a fancy planning mechanism.
Natural language generation can be put into the essential domain mold by making some restrictions on how things are done. For instance, one can limit the output to single sentences. Sometimes people think with this restriction there is less of a problem, or the essentials of the problem come to the fore. I believe this assumption makes the problem much harder than it needs to be, though the difficulties which arise in that 'simplified' case are worth a lot of consideration.
In a full-blown fluid domain, the situation is that we have a large number of building blocks that can be pasted together in various ways. The idea is to choose those blocks which can fit together well. Since there are many blocks, one can almost start off in any direction and find blocks that will curve things the way we want. The real problem is with using judgment to paste the blocks together and to keep the spiralling course of things headed in a good direction.
In an essential domain, though, we still have the problem of pasting blocks together, but we have so few of them that we also have to carefully select these blocks, and we have to locate a strategy for piecing them together before we can even start the pasting.
So now there are two problems where there was only one before. We haven't gained in simplicity, but have taken the problem of intensely clever planning and turned it into a very large system management problem, for where we once had a dozen ways to do things, we now have hundreds, and in order to use the blocks in a non-degenerate way, we must be able to explore in this space of choices and modify the system so that new paths come to light.
In some ways it is amusing to think about the ways that people have made problems harder than they needed to in the name of simplification. It is often said that one can easily afford to limit oneself to single sentence output, because with judicious use of punctuation, one can turn any multi-sentence text into a single sentence text. The claim is that there in no loss of generality. But to accompany this lack of loss, there are further simplifications which make the problem a puzzle once more. It is like saying that the goal of the research is to discover the ways of organizing a system that will be an expert at using tools to put together a car, but then limits the tools to a hammer and a screwdriver for simplicity.
Although progams are often described in standard input-output manner, with the program sitting in the middle massaging the data as it flows through, this is not how I think about the way Yh works, nor is it the way that I believe an AI program should be viewed. Yh is a program that is meant to exist beyond a single invocation of it, and it is meant to have a 'state of mind' in the form of its current set of beliefs (or parameter settings and resident data structures if you wish). This 'state of mind' is modified as the program works, so that running it through its paces after it has run awhile will not usually produce the same behavior it produced initially. In this sense I view the process of generation as putting a disturbance in Yh which causes it to behave in certain ways, a side effect of which is the production of text. The disturbance can be thought of as producing a number of requests for activities that need to be undertaken; as best as the system knows it will attempt to perform these activities until the desire or need for them subsides. These desires and needs are represented in an internal description language; this language and its uses will be described later.
At that point the text is spilled and Yh awaits the next perturbation. However, the internal changes that are made may be permanent and reflect its growing set of beliefs about how to best respond to outside influences.
I believe it is often a good idea to relate as part of a piece of research the history of the investigation. Usually results are presented, and the final views in those results are not the ones that guided the process. The situation is similar to that of mathematics in which the final neat proof rarely reflects the crazed ideas and approaches that were actually taken during the search for a solution.
The main idea behind this research occurred to me in the summer of 1977, while thinking about minds and Cartesian dualism. That main idea was metaphor. A metaphor is a process of considering one thing to be like another for the purposes of understanding, manipulating, and describing that thing.
However, there was a step before the strict metaphor step, which I called the 'meta' step. This had to do directly with understanding the mechanism behind the ability of the mind to introspect while still being a mechanical object - which, after all, is the fundamental assumption behind AI research.
In the book, 'Concept of Mind', Gilbert Ryle [Ryle 1949] argues that to posit a machine-like mechanism behind the mind means that there can be no privileged position from which the individual introspects his own thoughts. If there were privileged positions - the Ghost in the Machine - which could observe the activities of some part of the mind (introspection), he argues, then there would be an act of observing, and an act of observing that observing, and so on. If this chain is ever broken, then, he argues, there are some mental processes not accessible to consciousness, which means there is some fundamentally mysterious 'privileged' position; this corresponds to the ghost whose actions are responsible for mental behavior but whose activities are beyond observation. Pushing the explanation of mental events to this ghost is not an explanation, but a delaying tactic, in his view. This infinite regression is not possible in a finite organism, so it cannot be the case that the mind is organized that way.
Therefore, unless there is to be a mind separate from the body, then introspection is no different than ordinary observation in a literal sense. So we know our own states in exactly the same way that other people do, but since each individual is able to see a larger percentage of clues such as yawning, which indicates sleepiness, for instance, each person is able to seem as though he were in a privileged position, but this is an artifact of more observations and not better or fundamentally different ones.
In other words, introspection is nothing more than ordinary observations, the difference between 'internal' and 'external' observations is that some observers are in a position to see more of them than others. Hence, since I am with myself more than you are, I know myself better due to more observations.
The key moment was when I decided that the mind might have a privileged position AND must NOT have introspection as a special quality of observation. That is, the type of observations in so-called 'introspection' are exactly like ordinary observation, but the objects of observation are unavailable to other observers.
The two views, so carefully shown by some philosophers to be inconsistent were thus assumed to be consistent and consequences were derived.
So the picture I had was one of parts of the brain having as objects of observation other parts, and that the brain was stratified in observer-participant layers. This observer-participant pairing I mistakenly (maybe) called the meta-object pairing in the traditional mathematical jargon.
If the brain is a uniform miasma of neurons and connections (in fact, the entire nervous system is uniform), then it is plain 'the brain' observes the world, but it is not plain how the brain observes itself. Think of a program that has some data base of stored facts; each object in this data base can not properly refer to other objects unless that data base is non-uniform or organized in some non-obvious way. That is, in such a data base, there are things which point to other things and things that are pointed to, hence a hierarchy, a heterarchy, or at least some structured organization.
This, of course, does not address any implementation issues, since the memory in a computer is entirely uniform, and the data base ultimately resides there. However, the organization of the memory is different than the structure of the memory. The organization of memory is different from the structure in that structure is the physical makeup of the system while the organization is the intended pattern of interaction of objects residing in or emergent from the structure. In many ways, this is simply stating once more that representations require several participants: the physical object containing the representation, the real world, and an observer that makes the distinction. That there is a representation in the computer when we deal with it is mostly a product of the observer (the programmer) and not of the computer or of the program. Our interaction with that program convinces us of the representation, and nothing else.
So, I posited a layering of the brain or mind where each layer is viewed as a observer of the one below. The last layer is an observing layer too, but it observes the world, given some vague notion of observation. With each layer having as its object of observation and discussion some other layer, we have another problem to face, the very problem that haunted Ryle: there are not a infinite number of layers. Therefore we eventually must admit that there is at least one purely behavioral layer and one unobserved layer. Just as there are observers, there must be participants, and in the discussion of observer-participant relationships, the participant is often lost in the noise.
At this point we shown how a mind could self-observe by imposing some organization on the mind in terms of these layers of activity. Interesting and introspective, even self-reflexive, behavior can be seen to be the result of a no more mysterious relationship than appears between the mind and the world, and this latter, well-known relationship is as hidden or privileged with respect to the outside world as is the former depending on circumstances. The imposition of consistency on an inconsistent set of assumptions is achieved.
The problem, now, becomes one of naming or jargon: what should I call the observer/participant relationship, given that it is somewhat vague? There already is a prefix that people have been using for relationships like the one I had in mind: meta. Thus there is the meta-level of some base-level.
The prefix, meta, on the other hand, has a somewhat different meaning, especially as used by mathematical logicians. This meaning tends to imply that the base-level, as viewed by the meta-level, consists mainly of syntactic objects only. A standard example is that a base-level object is 'house' and meta-level statement or observation about this object is that it has 5 letters. Now, this tends to give the impression that the only types of statements possible about the base-level objects are statements about their structure.
The difference between the impression above and what I wanted to do was that instead of thinking about 'house' as a linguistic/syntactic entity I wanted to think of it as a semantic entity primarily, so that rather than just talking about the word 'house' as a clump of letters, I could talk about it as a concept of an object which had doors and rooms, viewed as a physical object, and as a concept that could enter into communicative activity, viewed as a word.
In short, the meta relationship can be more fully described as observer/participant. The meta-object includes information such as the description of the object (its function and its structure), descriptions of the slots, rankings of the importance of the slots, pointers to alternative descriptions of the unit, and how to 'invoke' the unit.
The participant object is a behavioral object in that it acts while the observer object comments.
Given the idea of a stratification of sorts, what is to be done with it? Where would the goals or desires of the system be kept, along with its current beliefs about the situation? How would the system get started and how would it decide what to do?
When faced with the problem of making an AI program have certain characteristics, it has been traditionally the case that objects with the names of those characteristics are created and operations that mimic some facets of the behavior of those characteristics are developed. In Yh the desires and beliefs are contained in a situation description in an internal description language.
The basic outline of the system is given in the figure on the next page. Organized as a collection of individuals with descriptions of these individuals existing in meta-objects, the main question concerns how these individuals are selected to act - the question of sequencing. In outline, there is an agenda-like structure to enforce a strict order on some events. Additionally there is a reactive component, which is the implementation of the participant portion of the system. In this component there is a description which represents the current attention of the system, and a matching mechanism enables the system to select individuals to act as a reaction to that description. I also call this the behavioral system.
In this matching scheme, participant objects are selected on the basis of the descriptions of them provided by the observer objects. These objects will henceforth be called units after KRL.
Invocation of procedure-like objects does not proceed in the manner common to programs in which the interconnection graph is known beforehand (who can call whom, what data to pass and how, etc), but proceeds by determining the interconnection graph on the fly, and by discovering information-passing protocols as they are needed.
The behavioral system is based on a hybrid matching technique that uses a simple locate-by-description mechanism as an indexing scheme. Hybrid matching is matching descriptions made up of descriptors and associated numeric measures resulting in a match that pairs descriptors and has a final measure of strength for the match.
The problem is to think of good ways to invoke the units, if they had a procedural component, much in the flavor of languages like Planner, where one did not need to know part of the interconnection graph for the system. In a way, this is replacing pointer methodology with name methodology.
This scheme uses a general description of the state of the knowledge of the system and a method of comparing the descriptions of possible reactors to that description. This process is viewed both as a mini-planning process and as a matching process.
The system picks the object with the best match. Hybrid matching is also considered as a sequencing method in some applications, so there is a multi-tiered agenda that can guide and examine the state of the general description in order to judge progress and make up better plans. In fact, a plan can be viewed as a sequence of these descriptions which is then 'reacted to' in order to obtain interesting behavior. In other words, unlike most plans which are of the form 'do x1, then x2...' or even 'do x1 and if c1 then ...', a plan in this system is a hierarchy of strongly directed clusters of loosely clustered activity. So, the plan might look like: '< general description_1> then < general description_2> ...' where each < general description}_i> contains the remnants of < general description_(i-1)>, those parts that could not be satisfied. One way of paraphrasing this plan is 'thrash around as best you can, using pattern matching on full descriptions in order to achieve this situation, then add to that situation this new set of goals and conditions and do that.' The hope was that in this way, some of the detailed complexity of a planner could be buried in a sophisticated pattern matcher, and the actual planning could be simplified.
The generation system itself is organized into groups of units with specific abilities. There is knowledge about how to express facts about programs and data structures, how to make noun phrases, verb phrases, relative clauses, etc; there are observers that watch the initial generation and propose transformations and other actions that modify the text, trasnformations, and text representation specialists. Finally there are scheduling experts that form and modify the agenda and situation description.
In short, there is a large number of experts in the system which must be uniformly accessed.
I put a lot of faith and responsibility in the hybrid pattern matcher I talked about a few sections ago. I hope you will soon find out why I felt that this faith was justified when I describe the actual matching process that is done. There are many ways to influence the match, and there is never a truly failing match, only ones with stronger or weaker measures. Measuring the strength of a match is referred to as partial matching. In this system, the strength of the match is actually computed, and I think that in doing so, I have built a pattern matcher that approaches some simple planning mechanisms in its local hill climbing ability, and one which makes the programming of a very large system easier than most people ever thought possible.
Yh is organized as a collection of objects called either 'individuals' or 'units'. They are called individuals because of their communication abilities and their autonomy, and units because of their packaging nature. These objects are able to communicate with each other in moderately interesting ways and are independent of each other in that the structure of each others' contents is not generally not available. 'Packaging' means that these objects are considered to be the smallest grain size relevant to the overall organization of the system.
The basic model of cognition used is called the stratification model. It is called stratification because the model is organized in different layers or levels of activity. These levels are composed of units, which are individuals with certain abilities. Most individuals or units have an associated meta-unit, which is a semantic observer of the unit. By observer is meant a unit whose object of expertise is the semantic content of the base-unit. To say that the meta-unit observes in the traditional sense is imprecise because there is no concurrent way for one object in a system to 'see' activity.
NOTE: The way that observation takes place is by communication - message passing - or by watching the description of the current situation. The former is supported by primitives in the underlying system and is a description based message passing mechanism. The latter is based on a description maintained by the system of the current goals and facts of interest to the system. By observing the changes in this description parts of Yh can monitor other parts.} This doesn't mean that the effects of a unit cannot be observed in a good sense.
Though possessing an unimpressive name, these stratifications form the core of the system, having a more immediate impact on the system as a whole than deep stratifications.
A shallow stratification is a tower of unit, meta-unit, meta-meta-unit, ... units which share the base-unit, meta-unit relationship all the way up in the obvious manner. This type of arrangement provides a closer coupling between units than does the deep stratification (see next section), which is an amorphous relationship.
As pointed out above, there is a close coupling of units which are related by the 'meta' relationship. In fact, one may consider that a set of units related this way form one component or tower, with responsibilities divided among them. Deep stratifications are sets of shallow stratifications, organized in a very loose way with the common point of being observers of activity of lower stratifications. Thus, the organization of the system is layered according to these deep stratifications. In Yh, these are roughly organized along the lines of traditional linguistic entities in the standard hierarchical ways - words, phrases, clauses, sentences. Higher level stratifications are for adjectival phrases, relative clauses, and other modifiers of the original levels.
More specifically, there are units that generate noun phrases, and there are units that observe this generation. These two groups form two stratifications, with the former being a higher level stratification than the latter.
The reason for this type of breakdown is that some sophisticated linguistic behavior is the product of observation of a simple-minded generator giving rise to modifications. So, when a phrase is generated that is ambiguous, an observation is made of this, and a higher, deep stratification level modifies the ambiguity.
Therefore, there are two types of observation that take place in Yh: the observation and description of units by meta-units, forming shallow stratifications, and the observation of groups of units by other groups. In short, then, deep stratifications are the mechanism of observation by one part of the system of another.
In Yh an example of a shallow stratification is the unit which represents a word (say a noun) and its meta-unit which contains the description of that word (in the system-wide description language to be discussed later) and the methods for interfacing that word with the rest of the text representation. So, for a simple noun, the base-unit contains the various forms of the word and little else. The meta-unit contains the description mentioned along with pointers to synonymous words. The interface in the meta-unit is able to place the word in the representation given various specifications of the location for insertion.
The lexicon (set of known words and phrases) is a deep stratification that is the object of observation of most of the rest of the linguistic part of Yh. The transformations known to Yh form another deep stratification that observes the first pass of generation. The transformation system observes the initial process of expressing and proposes transformations. This is done by observation rather than by embedding this information in the initial pass because one goal of Yh is the separation of concerns: the transformation stratification must be able to be improved and added to without having to alter the first pass.
Units are data structures in the KRL, Frame [Minsky 1975], and FRL [Roberts 1977] style in that there are slots and fillers. These slots contain data (fillers) as well as descriptions and procedures of a standardized nature. Each deep stratification has a set of locally meaningful slots and filler semantics, as well as defaults within the stratification. However, in terms of the implementation, the slots are whatever the user of the system wants given that the small set of system-wide slots contain the right sorts of fillers. In a deep stratification that represents some facts or objects in the world, there may be slots that convey the standard inheritance that KRL, for instance, does. In deep stratifications that are internal (e.g., represent knowledge Yh has about itself) inheritance may be absent, though there is local consistency of slots within each stratification.
Not only does the meta-unit contain some description of the unit, but it performs certain prescribed functions for it. Namely, it 'advertises' the object unit's purpose in two different ways. The first way is in the form of simple patterns; the second is in the form of a description language based, in part, on this pattern language. These descriptions are used to match desires against abilities, and is, hence, a method for the system to reason about its own abilities.
The meta-unit also describes the contents of some of the slots in the object unit. This description is not a description of the syntactic qualities of the slot, but of the semantic usage of it. Since meta-units are still units, they have slots and fillers, with the difference being that, mainly, slots of meta-units are filled with descriptions rather than data. A description, as used here, refers to an object in the system description language; as such, it is strictly data, but it is data with a standard set of uses. Namely, through the descriptive system, Yh is able to reason about itself and use units from stratifications with differing standard slots. Sometimes the slots of the meta-unit describe qualities of the object unit as a whole, and other times the slots describe qualities of the slots of the base-unit with the same name. If a meta-unit and a base-unit contain slots with the same names, it is assumed that the meta-slot describes the slot; if the name of a slot in the meta-unit does not have a correspondingly named one in the base-unit, it is assumed that the meta-slot describes the base-unit as a whole.
The distinction made is that the meta-object is in the relation of an observer or commentator rather than a meta-object in the standard mathematical terminology. The correspondence between a unit (and what it represents) and reality (for example, the existence of that object in the real world) is explicit in the meta-unit and never implicit in the unit. By this is meant that the closed domain assumption is not made: if some fact is absent (where a fact is represented in some standard way via units) then it is NOT assumed false. Furthermore, the fact that some unit representing something is present MAY NOT mean it is true. The explicit 'word' of the meta-unit can make a 'fact' true or false.
The following units are from the input of the Dutch National Flag program description, which is the major generation example in [Gabriel 1980]:
{dta-str1 level 1 meta meta-dta-str1 markers (dta-str2 dta-str3 dta-str4) type array element-type (one-of (R B W)) dimension 1 length N first-element 0 last-element (1- N) name flag1 base 0-based}
{meta-dta-str1 level 2 beta dta-str1 element-type (((an element-type) . 1000)) meta meta-2-dta-str1 data-structure-for program1 descriptors ((array . 1000)) represents ((flag1 . 1000))}
{meta-2-dta-str1 level 3 beta meta-dta-str1 element-type (((R represents RED) . 900) ((B represents BLUE) . 900) ((W represents WHITE) . 900)) rankings ((type . 500) (name . 550) (element-type . 400) (dimension . 525) (markers . 425) (length . 950) (first-element . 300) (last-element . 300) (base . 525))}
These units represent an array, which is used to represent the flag in the program. The name of the base unit is DTA-STR1, and is a zero-based, one-dimensional array, of length n, with three array markers into the array, with elements selected from the set {R W B}. An array marker is simply an index into the array which is used as a place keeper.
The meta-unit is META-DTA-STR, which states that DTA-STR1 is an array, with maximum confidence. Two user-defined slots indicate that this is a data structure that is used by PROGRAM1, and that the array represents FLAG1, which is the flag in the problem. The ELEMENT-TYPE slot just says that the ELEMENT-TYPE slot in the base unit is an element-type.
In the meta-meta-unit the rankings indicate the importance of the slots. The range for the measures are: -1000 <= m <= 1000. The ELEMENT-TYPE entry is a statement about what the various entries represent, which is a statement about the entry and about the semantics.
The RANKINGS slot, in this case, is used to order certain modifiers in the generated text about this object. The view is that the slots in a unit define, to some extent, what is interesting about that object, and the higher the ranking, the more important that aspect is. Therefore, when being used as a source of adjectives, for instance, the highest ranked slot is closest to the noun in question.
In Yh there are two distinct, but related, ways that correspondents to objects (physical and conceptual objects) are used and/or referred to. One is with representations and the other with descriptions. The representation scheme that Yh uses is based on units with slots and fillers, organized into meta-unit/base-unit towers with certain properties, able to support various types of inheritance, both implicitly and explicitly, and so on.
The objects in Yh that are instances of the representation - units - correspond to objects in the world or other objects in Yh. In building these objects it is hoped that there are operations in the system such that applying these operations on the objects corresponds to making inferences or observations about the world, and, in particular, about the objects in the world the representations correspond to.
NOTE: I will use the word representations in the discussion in this section to refer to the objects built up with the tools of the representation system. So if there are some actual units in a system, then they are the representations. What they correspond to in the real world (or in the world of the system, if units correspond to those sorts of things) will be called the objects.
There is a second kind of thing that can be used along side of a representation system, which is a descriptive system. Here there are descriptions, which are very much like linguistic objects to the system. A description is a set of descriptors which can be reasoned about, compared, constructed, and taken apart. A description is an object which presents the features of a representation and their relative importance. It is a monolithic structure in order to be easy to manipulate by a matcher.
The representations in any system constitute the world view of the system. 'Thinking' about the world means manipulating these representations and nothing more; creating new representations is either expanding one's world view or becoming demented, depending on the reliability of the new representations in handling the world.
On the other hand, one can create and manipulate descriptions freely because they constitute the explicit beliefs about the nature of things. A description is a way of noting to the system that some belief is held, or some facets of a representation are being considered.
The description is used in Yh as a linguistic entity. Creating and manipulating descriptions is simply ruminating to itself much in the manner that people sometimes 'talk' to themselves. By allowing the free use of descriptions without requiring the system to make commitments about its long-term beliefs and world view, a great deal of freedom is possible.
The different uses to which the two are put determine some features of their structure: the representation is used for reasoning and thinking about the world, doing information retrievals, determining defaults when new objects are being created, and other structured activities. The description is used for passing around beliefs, matching, partial matching, changing relative importances, ignoring parts of a description, speculating about descriptions of things, having other influences alter the way the description is viewed, and other unstructured activities. Therefore, the representation is structured to accomodate structured actions, while the description is unstructured to facilitate unstructured actions.
For example, when a program is being described by Yh this is a structured part of the process in that the parts of the program that need to be discussed are explicitly there and in some order that is specified within the unit that represents the program. This structured representation results in the structured overall plan for discussing the program - talk about: the first array, the second array, the list, the main program, the subroutine, the first macro, and then the last macro. This is a representation driven process because the structure in the representation maps straightforwardly into the structure of the actions of the system (making the plan).
When the generation process moves to word and phrase choice, though, the safe grounds of a representation are left and the choice of action is left up to the descriptive system, which finds units (representations) that are appropriate within the abilities of the system to model the situation. The decision of what to do at the point where the system can only do the best to achieve some description is a judgment about appropriateness.
So, Yh manipulates descriptions in a description language. When a generation task is presented to it, that task is in the form of some representations about the objects that text will be generated about - the program - and a description of the generation task. Yh responds to the description and not the representations. Adding the representations informs Yh about a new world view - now there are new objects. If the request is in the form of a representation, then that would mean that the request is something Yh knows about, much as it knows about the new program, but that will not cause it to want to talk about the program; the description of the request will.
Another way of seeing the difference between descriptions and representations comes from viewing each representation in the system as an individual with special abilities. For instance, an individual may stand for some object in the world or it may be able to perform certain activities within the system. In Yh both of these functions are realized: there are units (representations) that are the meaning of a generic program, an input, a transformation, a word, and there are units that perform transformations, move things in and out of the agenda, decide what form a sentence will have.
A representation is then appropriate in a given situation or it is inappropriate. For instance, a unit representing a chair is appropriate when a chair is being recognized or a unit to insert a specific word is appropriate when a word with that given definition is needed.
Specifying when a representation is appropriate is the role of a description. So, associated with most representations is a description of when that representation is useful. This description is then used for matching with a situation to determine the degree of appropriateness.
One of the reasons for a descriptive system is that the choice of action may depend on longer range goals that need to be expressed. That is, there may be some background condition (such as not using words with certain connotations) that influence the activities over a long period of time, but not in a dramatic way. This is accomplished through the descriptive system while decoupling it from the representation system.
Finally, one may want to decouple the activities described above because there is a desire to make each unit diamond-like, representing a narrow but precise amount of information, and any stretching of that amount should not tamper with the structure of that unit.
Associated with units is a certain stiffness, in that a unit is a solid point or individual within the entire system. Through linking of descriptions with units, the stiffness is reduced because the decision of appropriateness is made through a matching operation that is decoupled from the representation. If this matching process allows one to stretch the applicability of a representation to a situation then there is less need to anticipate every eventuality with specific units: other units can make do in the unexpected situation.
There is a global description of the current state of beliefs and desires, and there are descriptions attached to many representations. Many of the things that Yh does involves changing its behavior based on the description of what is important at the moment.
The central idea is that the global description is a set of beliefs, among other things, and the tide of belief can ebb and flow when the reactive component is allowed to press forward. Once something is in a representation it is fixed, so that any such ideas of changing importance over a short period of time must be expressed through the descriptive system.
NOTE: The component that lets the descriptive system control sequencing.
It is interesting to note that, empirically, the structured planning and control of the system is invariably the result of representation driven activities, the representation being structured, and hence the action; similarly, the unstructured planning is the result of the description driven activities, descriptions being unstructured.
In a system using representations and descriptions one expects to see a continuum from structured to unstructured activity. In a natural language generation system the structured activity is planning the contents of each paragraph and inserting the correct components of a selected sentence type, while an unstructured activity is word choice or even sentence type choice.
It often seems that in everyday life planning is rarely more than a matter of deciding what the best thing to do is without making an explicit plan. By this it is at least meant that there are many partial or whole plans that sit around waiting to be used with elaborate enabling descriptions of them (here a mechanistic terminology is being imposed on a non-mechanistic example) which are triggered by some deliberation on the matching of these descriptions to the current situation. This is essentially the view taken by Schank [Schank 1977] and many others. This is contrasted with the view that these plans are constructed on the fly each time.
Though building plans is a worthwhile and necessary type of activity, it is an activity like any other that must occur in some framework, which includes the ability to make the kinds of decisions about what to do next outlined above.
Maybe it is helpful to think about the difference between natural language generation as a problem for AI and 'formal' problem solving. In the latter case, problems are presented like those found in the Sunday newspapers or in puzzle books where there is a situation, some rules, and a goal in mind. An 'intelligent' person figures out how to apply the rules in order to obtain the goal. In the former case, one has something to communicate, some methods for accomplishing the 'goal' of expressing that something, and one does that until one is satisfied. Of course, there is a natural enough sounding comparison or analogy between these tasks. But the major difference is that there are, usually, a huge number of ways to accomplish the goal of expressing the situation, and that a large number of them - nearly all? - can work. Furthermore, there are a number of other factors which can contribute to how that process is carried out, and therefore, to what the final result will be. Thus the problem becomes one of how to decide what to do next.
It becomes apparent that what is needed is a mechanism for looking at the current situation as it is 'known' to the program and to be able to assess the best course of action from that examination.
This is called the reactive component of Yh. This is what happens on the large islands that the planning process provides. These islands are islands of description of some desired state.
The main thrust of these description entries is to have the system, that is, its individuals, interact with each other and with these descriptions of some of the things it knows in such a way as to produce some interesting behavior, both external, observable behavior, and internal, trace behavior.
Roughly speaking, there are several ways to view the activity of an interesting object, especially artificial objects with purposeful behavior. The first is to think of the system as waiting for some set of goals to satisfy (work on), at which point powerful techniques are brought to bear so that the capabilities of the system bring about the completion of the goals. A second way is that the system simply exists and is always acting. Sometimes this activity is quiescent (that is, the activity is uninteresting at the moment), and sometimes it is perturbed by something which causes a change in behavior which is once more interesting. In a language generation system, the system is perturbed by planting a desire to express some situation, which will be expressed to some satisfaction. A better example is a timesharing or other basically interrupt driven process.
NOTE: Many of these ideas originate from the work of Maturana and his colleagues [Maturana 1970], [Varela 1979]
These two views, of course, are not incompatible, but the latter has at least one worthwhile implication that is more apparent from the outset - namely, that it is nowhere mentioned that the quiescent state, if it exists, is the same as all other quiescent states. The state of the system is a product or natural result of its past. One should not expect that there would be a large number of equal values for this function. So a perturbation results in a different quiescent state, and it is expected that if a similar request was made of the system, it might not respond the same way: the same way people are expected to change as more experiences are gained. This is to be distinguished from learning, in which by some external measure progress is made or better responses are given. Here there is no learning, but success is measured by the fact that the system stops reacting in some time and survives in a new state to react on other occasions.
The philosophy of this system is that it exists and reacts to influences brought to bear both by itself and by the 'external world' in the guise of externally introduced entries on the above-mentioned list of descriptions. This list is called the 'system-goals', somewhat of a misnomer in that it contains descriptions of the state of things, counter-goals (or anti-goals) as well as the things normally thought of as goals.
In having an object, in this case an artifact, act in an interesting, though possibly not intelligent way, there seems to be a need for a number of mechanisms that interact in a complex manner. It doesn't seem that one should be able to find a half dozen mechanisms that have the richness to exhibit the kinds of things one would expect to see in a full natural language generation system. Certainly a human mind is more complex than an operating system, and perhaps one should not be able to see any sort of truly appropriate behavior in anything less complex. So far it has been hinted that there are several mechanisms that are present in Yh: units in a meta-style relationship, communication to both be used as traditional communication and as the basis for self observation, levels of activity to monitor behavior, and finally what is called 'influence-based matching'. This last item has also been called 'goal choice' or 'planning' herein, and now it will also be called searching in order to throw the full weight of its aspects onto the reader.
To some extent, behaving in an interesting way is no more than reacting to a situation without much thought or standard planning. Presented with a situation or with a what-is, and given a set of possibilities, one is selected which is appropriate to the what-is. As with anything else the game is to formalize the what-is, the potentialities, and turn the selection into an algorithm.
NOTE: My view is that in day to day activities a person acts routinely, as if encountered situations are familiar enough that standard plans with some slight modifications can be applied, rather than elaborate new plans constructed. Speaking is such an activity in that people can generally ramble on without really stopping to think for tremendous amounts of time. There are many things that people do that argue for recognition as the main mental process rather than hard core thinking. This is what I mean by reaction, recognizing a familiar situation and applying a related plan.
Meta-units contain entries labeled: PURPOSE, GOALS, PRECONDITIONS, CONSTRAINTS, PREFERENCES, ADDED-GOALS, SOFT-CONSTRAINTS, PAIRING-FUNCTION, INFLUENCES, and COUNTERGOALS. These entries are used by the goal choice process to 'match' a unit's advertised behavior against the needs of the system as measured by the system goals. This matching process is quite complex, and reflects, in some ways, the fact that the architecture of digital computers may not be ideal for AI, though probably components of it are. However, if the converse is taken to be true, namely that only those things which are relatively simple and efficient to do on current digital equipment are relevant to AI, then perhaps the problem with progress is that the natural complexity giving rise to interesting behavior will be left out of these systems.
The system-goals 'workspace', which is a unit like any other, but one whose primary purpose is to store information as a kind of description sink/source, contains a list of entries and a list of 'influences', described below. These individual entries are then considered together as a description of the relevant parts of the situation for formulating what to do next, though it may not be the entire description in absolute terms.
The game is then to find the unit which best 'matches' this situation in some sense of the word. To accomplish this, it is necessary to bring the description of the unit into contact with the system-goals so that the points of comparison can be made; in the end, it is desirable so obtain a measure of the strength of the comparison and thereby choose the best one. Entries in the system-goals unit are paired with entries in the unit under consideration - in particular, with entries in the goals slot. Since later it is important to know the points of comparison that led to a rating, a record will be kept of the pairings of items. So the interesting parts of this method are to produce a pairing and then to measure the contribution to the strength of match by each pair in the pairing.
In other words, this is very much like a plausible move evaluator for a one move look-ahead tree search algorithm: there are a number of units that can be invoked next, based on having a description that resembles the desired state. However, some resemblances are better than others, so a numeric measure of the resemblance is calculated.
In any case, let us turn to the method by which an individual is selected to operate on the current situation.
Initially there is a set of descriptions which describes the current beliefs the system has. The important feature of this description is that it is a combination (unordered) of patterns which tell something of the situation. For example, a pattern might be:
(known-at red table)
which states that there is something red at the table. Of course, this syntactic object does not 'state' that, but rather this is the form which causes the system to behave in such a way that one could say that its belief is that it knows that something red is at the table.
The existence of the pattern in the set of patterns representing the description can not and should not be taken as an absolute avowal of a belief in that object. Recall that in the unit system talked about earlier, one can have a unit that says something while its meta-unit can deny it. Similarly the description of the situation has the same quality without recourse to a lot of meta-type overhead.
Thus, there is a measure of the faith in the description; this measure is, again, strictly meaningless without the behavioral impact it has on the system. Think of this measure as affecting the strength of the belief, the attention that gets paid to it, or the importance it has in the description of the situation. This idea is only the next reasonable step beyond the all or nothing style of description used in some other AI systems. The claim is that there are some interesting behaviors associated with this scheme.
NOTE: Which of these interpretations is a event in an observer of behavior of the system and is not properly part of the system itself explicitly, at the level of participation. The use of the measures will vary according to the motives of the controlling processes.
It is also important to represent in a uniform way the desires, goals, and predilections that the system has. Since a desire can become a 'fact', a second measure of this desire is added to the items that constitute the situation description. So far, then, as part of each item there is a pattern consisting of a formal description of the item, a measure of the interest that the system has in that item being or becoming a fact, and a measure of the extent of belief the system has about the item.
An example of why the measure is associated with the description is by considering matching two descriptions, one of a paradigmatic object and the other an instance. By letting the important descriptive elements of the object have a higher weight. The unimportant details, which are useful for distinguishing the actual individual described, could have a lower weight and would only be considered once the important attributes have been accounted for.
So when one is trying to tell if an object is a table, the important table-features will dominate the match, while once it is ascertained that there is a table, the lesser weighted features, like color, dominate. Finer distinctions rise to the fore once the larger, measurably dominant ones are equal.
The problem that this approach solves is that of stating in a uniform way the fact that some features are to be considered above others in certain cases. If all features are given a yes-no status, then one can end up in the bind of making decisions about the identity of objects based mainly on their color.
Of course, an all or nothing scheme can be made to work by making buckets of importance in the descriptions, so that there is a discrete gradation of the importance. The possible advantage of a numeric system would be 'continuity' and the ability to also obtain easily a measure of the strength of a match.
Up to this point there is a place for descriptions of the desires and beliefs the system has. In addition there are descriptions - advertizements - associated with many units; a reasonable thing to do is to try to match one of these descriptions against the current situation description. Since the situation description can contain things that are not of interest, and since one does not want to get a strict yes or no answer (in all cases) this matching must take into account the measures just talked about. The descriptions attached to units just contain an absolute measure of importance, unlike the ones in the situation description since the former just measure strength, while the latter also measure degree of belief and degree of desirability.
If, for some units, there are descriptions of what each of those units can do in certain cases, and if there is a description of the current situation, one might want to find the unit whose action description matches best the situation description. There are two reasons to want to make such a match, and therefore two distinct types of matching. The first type matches the advertised purpose of a unit against the situation description viewed as a statement of the desired situation in order to decide which unit to activate; the second type matches the identity description of the unit viewed as a description of the object further specified by the unit against the situation description viewed as a description of an object.
Since the pattern matcher produces a pairing of descriptors as well as a numeric measure of the strength of the match, it is possible to allow various conditions to change the strength of the match without necessarily changing the validity of pairings. The mechanisms for doing this are embodied in the details of the matching algorithm [Gabriel 1980] and in the slots named INFLUENCES, SOFT-CONSTRAINTS, PREFERENCES, and COUNTERGOALS.
The matcher works with a measure of strength of a match. A SOFT-CONSTRAINT is a predicate and a measure such that if the constraint is true, the measure is added to the final value for the match. The measure associated with the predicate can be a function that returns an integer.
The COUNTERGOALS slot contains a list of patterns, such that if any of these patterns match any of the objects paired with items in the GOALS (the main descriptors associated with a unit), the measure is added as above. This is intended to be an exception to the rule, so if, in a simple blocks world application where blocks are moved around on other blocks and on a table, a unit will put a block on another, but not on a specific block without difficulty, a countergoal can be used to reduce the measure of the match.
The PREFERENCES slot is a list of the form (...(< name > < predicate > < amount >)...) such that for each entry, if the predicate is true then < amount > is added to the measure. Additionally, there is a queue of unit names and preference names along with factors and thresholds such that at regular intervals the factors are applied appropriately to the preference in the unit named, and when the threshold is passed, the preference is deleted. Thus, there is a decay mechanism associated with the preference slot, which is not available for the soft-constraint slot. Additionally, countergoals and influences (see below) can be decayed.
The INFLUENCES slot is of the form (...(< pattern > . amount)...) and is like COUNTERGOALS except that the pattern is matched against various things in the system description, adding < amount > to the measure.
When control is within the grasp of the pattern matcher (rather than under the control of standard methods) the programmer is able to influence matches, and hence behavior, rather than demand that behavior. Of course, this is useful only when absolute control is not necessary.
At this point is possible to understand broadly the type of overall organization of Yh. Namely, there are points, called units, of diamond-like hardness and clarity. These units represent objects or contain potential behavior in certain situations. The applicability of these units, though, is under the control of a matcher which stretches the pre-conditions for the use of each unit to a fluid environment. I think of this organization as like a large group of mechanical devices with input ports made of very pliable rubber that are then able to be interfaced to many other devices fairly easily, although the results may not be exactly optimal.
Such a technique for interfacing devices cannot be tolerated unless there is in overriding observational facility for watching over the participants, and this is exactly what the stratifications provide.
Of course, it should be fairly obvious at this point that there cannot be any small examples, because to influence the behavior of a large system with many possible distinctions, there must be that hard core of many possible ditinctions. However, it is possible to give part of an example, in this case one which involves word choice.
The word choice problem is probably the ultimate problem for language generation: first, words are the first large linguistic objects learned by individuals; second, many words have many synonyms or near synonyms; third, words are in some sort of network-like structure since words remind people of other words, a relationship similar to synonymity; and fourth, sentence schemas cannot be used unless the underlying support of words is missing. Thus, any generation system must be able to plan generations in the presence of frequent failures.
NOTE: As opposed to small objects such as intonations, etc
Natural language generation, also, is a unique domain for AI research by emphasizing that richness is a source of relief in solving the problems encountered as well as a source of problems. Repetition of a phrase can be taken as an indication of relatedness of referent when that may not be the case. In other words, if I use similar expressions to refer to different things, there is a tendency to think that the two different things are related, if not the same.
As expected, I will use an unusual example from literature to demonstrate this point, one which will hint at the subtlety that exists in real writing, and, hence, creativity. The example is from Moby Dick by Herman Melville [Melville 1851]. In that book there is some evidence that Melville intends to identify in some ways Ahab with the Whale, some saying that he is making an analogy to Christ and God. To support this is the fact that several of the descriptions of each are similar. For instance, in Chapter 41, Melville writes of the Whale:
For, it was not so much his uncommon bulk that so much distinguished him from other sperm whales, but, as was elsewhere thrown out - a peculiar snow-white wrinkled forehead, and a high, pyramidical white hump.
and in Chapter 44, he writes of Ahab:
...the heavy pewter lamp...for ever threw shifting gleams and shadows of lines upon his wrinkled brow, till it almost seemed that while he himself was marking out lines and courses on the wrinkled charts, some invisible pencil was also tracing lines and courses upon the deeply marked chart of his forehead.
Thus, though he never states outright that Ahab and the Whale are related except as hunter and hunted, or as obsessed and obsession, the fact of two similar descriptions of the two bring them close, and perhaps suggest a closer, familial or even identical relationship.
In generation there is a great sense of being able to kill two birds with one stone, or even more birds with a little bigger one. For example, suppose you want to talk about the color of a young girl's eyes, and the girl is 3 years old; and suppose that you don't want to specifically refer to the age, but to hint at the youth - that is, in the jargon built up in the rest of the thesis, there are influences to specify youth but no direct request. Then you might use a phrase like ...girl's baby blue eyes.... The baby blue part refers to a color, which is a little lighter than regular blue, but the possible lie associated with the exact color may be compensated for by the nice tendency the reader has to think of the girl as younger rather than older. This is a matter of judgment, not of iron-clad deduction or reasoning.
Creativity involves the ability to make choices on the basis of current and past tendencies, but it requires a large number of choices, and it requires that the distinctions available be subtle enough to capture the nuances desired. Creativity is not being able to solve a puzzle cleverly, necessarily, nor is it coming up with a complex, involved plan of attack, but it is using techniques available in an unexpected, new way, or putting things in unprecedented order.
There is a distinction between doing something and noticing that you are doing it, and some actions can only be based on the noticing. In Yh there are units that produce noun phrases, and there are units that notice that the same noun phrase appears more than once. Finally, other actions are taken on the basis of the observation.
The process of describing an object is the proper domain for an observer, for only in the context of an observer can the distinctions necessary to the description be made clear. When I describe a noun phrase I can choose to describe the letters in it, the words in it, the reasons I chose those words, why I said the noun phrase in that sentence, or why I said that noun phrase (to get you to believe something, for instance).
Since a description is within a context, meaningless outside, it then makes sense to separate the description from the object, and thus the context can be placed outside the object as well. This is not to say that any given implementation of an observational item needs to be physically (or implementationally) separate from the object, but any available distinguishing entities in the system (and in our minds too) should be able to distinguish the participant from the observer, and hence the description from the described.
Descriptions can act as stand-ins for the actual participants in making judgments in a domain. More precisely, if there is some description of an object that purports to describe the activities of a unit, we can manipulate that description and pretend that the actions have taken place in order to get some speculation on what other actions can then be taken.
With a rich descriptive language, and a matching language that does not mind ignoring things when necessary, there is no reason to avoid full descriptions of every facet of an object. We can use the measures on descriptors to put things in their place, but to leave possible distinctions out is to rob the system of potential power.
With descriptions as part of the observing entities we can also attach many different observers to any given object and thus partition the views of an object without unwanted interference. So, for words with several meanings, we can have a different description for that word, each of which can be manipulated by the system, but with limitations on the interactions between them, and we can thus gain additional context as part of the bargain.
For instance, consider the word, bug, which means either an insect or a problem in a program. In the notation given earlier we have two choices on how to represent this curious fact. One is:
D1 ((< insect > . n1) (< problem > . n2))
in which n1 and n2 are of comparable measures since they about equally are adequate expressions for the concepts they represent; that is, bug is as good a way to say insect as it is to say problem with a program. Another way is to have the multiple descriptions
D2 ((< insect > . n1) (< problem > . n3)) D3 ((< problem > . n2) (< insect > . n3))
where n3 is some smaller measure.
In the first case, when there is interference between the concepts, such as when we are talking about programs about insects (in either pun-ish interpretation), the first representation may be too sensitive about the crossover: the strength of the match is too high because of this presence of the other interpretation in the picture. In the latter case, multiple descriptions are in better control, the interference strength there being decoupled from the normal, expressive strength.
And the problem with puns is that in a generation system done right, in which reminding is an important part of the behavior of the system, you are never going to be able to avoid puns, and you may even have to go to great lengths to be rid of them, such as using multiple observers. If there is some influences that are added due to a certain choice of words, then the results of letting these influences help choose further words can be a problem. So, if there is a program about insects that has a problem, then the use of the word 'bug' for this problem may be intolerable. On the other hand, if I am describing a love relationship which is also a physical relationship, I might use the word 'passion' in place of 'love', which is entirely acceptable, and both arise from the same mechanisms.
A representation is some objects and operations on those objects which allow an internal manipulation to reflect a perceived external manipulation. So a representation of an engine should allow a program, or whoever, to look at its parts the way we can look at the parts of a real engine. And postulating putting a part in a certain place in the representation requires that part can be put in that place in reality as well. But to have such a faithful representation means that there is an observer who can observe that the faithfulness is exhibited. Without an observer of this correspondence the question is truly meaningless. This means that there is no objective measure of the appropriateness of a representation; faithfulness is a judgment in the life of an observer, not in the life of a representation.
More importantly for practical applications, a representation maintains distinctions. A distinction is a partitioning of the world (whether internal or external) into things that can be distinguished. A representation that allows few distinctions is worthless. But, a distinction is not worthwhile either unless the basis of distinction is available as well. For if all we can say about two things that are distinct is that they are distinct, then the distinction does not help us decide in which case one distinction is appropriate and the other inappropriate.
In the simple blocks world, since table and blocka, for example, are not distinct that a planning system has difficulty understanding that they are to be used differently. Since the representation scheme that underlies this can always distinguish using LISP EQ, it can always make the distinction. The distinction of interest in this case is that of differing use. A table cannot be placed on a block, and every structure of blocks has the table at its base. These facts trim the search space, and solve some anomalous situations. These distinctions are required to be made explicit in the representation and not only in the operations on that representation.
For, suppose that the move operation knows that the table must be at the bottom of a tower (by knowing that it cannot be moved onto anything), then this does not help the planning system know that if you want a tower A on B on C, that C is on the table. This is the distinction that needs to be made to allow unintuitive - sometimes - choices while planning in this domain.
The main point of the descriptive mechanism is that it allows one to specify distinctions in a monolithic manner in order to make them more easily available. By putting measures on the unordered set of descriptors one is able to distinguish on the basis of those measures, or ignore their influence altogether.
In hierarchical or tangled hierarchical representation schemes, such as KRL or semantic nets, these distinctions are made available through search chains of, for instance, isa links. For example, one can represent the blocks and table world in such a system where the isa links on a block and the table eventually lead to prototypes which state the distinction. And one can obtain the fine dusting of distinction in the structure of the web that exists around the objects.
So, in the measure of a descriptor we can observe the relative importance directly without needing to untangle webs and, hence, use resource limited searches to diminish in proportion to the importance.
I find, too, that the description pattern matchers explained capture nicely some of the things we want to get out of the resource-limited matchers, in which there is some sense of partial matches or undefined results. In these matchers, as in the KRL matcher, when we match two things we end up with a set of corresponding features that substantiate the match and some notion of how strong the match is based on the amount of resources dumped into the effort. In this matcher, we end up with a set of possible pairings of descriptors and the total strength of the match, which can then be thresholded or whatever to produce the behavior you want.
Further, in the framework of Yh there is a mechanism for saying 'match these two things, and I guess I'm sort of willing to let these strange things be true to some extent during that match'. This is the influence and soft-constraint mechanism.
In short, this descriptive framework and matching procedure can form a common ground for expressing many nuances and temporary tendencies without the need to bury these things in a static, difficult to work with, and thus more permanent, nest-like structure.
I find that it is nice to be able to match two descriptions and have something other than T or NIL come back, and this means that I can expect to compare apples and oranges to some profitable conclusion.
On the other hand, many people balk at the idea of a non-symbolic methodology for accomplishing these ends, though they find the idea of resource expenditure profitable.
Here I want to make a distinction between intelligence on one hand and judgment on the other. judgment is not a matter of right and wrong per se, but intelligence often is such a matter.
In judgment, we make a choice based on what we think is the best or most appropriate thing to do or believe, and if there is a choice, then there is a ranking of the choices as evidenced by the result of the judgment. How different a thing than intelligence where the right decision is often a unique item.
NOTE: This is not to say that there isn't some shading of intelligence to judgment, just that there is some dimension between the two.
The measures associated with the descriptors in the situation description do not correspond to probabilities, since the latter have a firm connection with truth and fact, in that a probability, p, means the likelihood that some thing is true. In this system, I mean by a measure, p, the amount of commitment to a fact with respect to other possibilities. That is, one can state for any two entries which one the system has more commitment to, but these measures are subjective to Yh!
The situation description that is used can easily represent the fact that an object is believed to be at point x and at point y, where x =/ y. Of course the measures of the two facts may not be equal, and they may not be absolute, but nevertheless there is a good sense in which the data base reflects an inconsistent fact.
To many systems and to many people this is intolerable: it means that we can derive anything! However, this can only be a problem in systems where it is a priori true that this is a problem. In my system, there is some numeric measure of a fact beyond which it is willing to commit itself to action on that validity of that fact. In the Blind Robot problem, the fact of contradictory statements in the data base is used to switch the tide of belief from some objects being at some locations to some other locations. And since there is no observational facility to determine which facts are true, manipulating these contradictory facts is a good method for dealing with the situation.
The rationality of this system of representation can be seen readily when you forget about facts and absolute systems and think about generation as the paradigmatic problem. Here you have to worry about whether you have been able to express something adequately. So, in the first reference you might be able to express some fraction of the full fact, measured subjectively. But it takes a second reference to complete the thought. So if you wanted to talk about an array, but didn't want to have very long noun phrases, you could say "The zero-based, one-dimensional array...R is initialized to the last element, 27...". Here the first reference didn't say the length of the array, so a future reference to it detailing this fact is made.
The representation and description systems used in Yh are very good for this sort of thing.
The idea of programming a system to demonstrate interesting behavior, such as generating stylistically pleasing English, is a funny sort of thing in that the operant word in programming languages is imperative, even in applicative languages. This means that the exact activities of the program are essentially dictated by the programmer using unambiguous instructions with conditionals. Of course the exact program behavior may depend on outside input through sensors, but basically the normal style of programming is like writing an algorithm.
I hope you have noticed that although certain parts of Yh are like this, such as most of the procedural part of the units, there is a strong sense of leaving control up to a search or pattern matching process, which takes into account the entire current situation. There are several methods of sequencing in Yh: the normal applicative/imperative methods, the agenda, and the situation description.
Of these, the first fairly explicitly determines control without much doubt about what is to happen. The agenda mechanism is a step away from this philosophy in that the things that need to be done are put on a list which has some priorities or other predicates that are used to determine when and if an action is to take place. Thus the unavoidability of what to do next is taken away to the extent that the final course of things can be altered easily by changing the agenda priorities.
For example, suppose that we start out in a traditional program to do some things; except for conditional control based on variables (or whatever) that are set as time goes on, the flow of control is determined at code creation time. In an agenda-based system one part of the program can say to do some things and a later part can countermand that.
Now, I don't think I need to say that there is no gain in theoretical power from one type of control to the next, but there is a gain in expressive power and a shift in the way we look at programming.
With influence-based programming (my jargon) there is a description of the supposedly relevant facts and goals, which form the core of what is to be done in the near future. In addition, there are influences and soft-constraints which may affect what gets done, depending on the distinctions that the descriptions of the participants provide. The choice of what things get done is up to a matching process which locates the individual who satisfies the most needs the best at the moment.
Programming in this environment is more like saying things such as: These things need to be done, keep these other things in mind, and I'd prefer to see these conditions true. So the flavor is that of suggesting and manipulating the description of the situation to reflect the best idea of what needs to be done.
I feel that normal programming is simply taking one's own general description of what has to be done and finding a sequence of actions, data structures, etc., in the language which accomplishes this. So in assembly language, if I want to get the quotient of one number by another, which is very important, and I also later want the remainder, which is not so important, I eventually find the instruction that does both (IDIV on the pdp10).
In the generation that was shown as an example there was an admonition to not use too many adjectives. This was expressed as an influence to the entire system. I could have programmed it to check at every place for the value of some variable, but instead I can just express that admonition itself along with how strongly I want it to be enforced, and when the descriptions of the units that do contain the distinctions necessary to recognize this admonition are present, everything is taken care of. In some sense, I give advice to Yh about how it should act, and that is how I control its style and its creative capabilities.
I have mentioned earlier that the problem with most AI research to date is that there has been too much focus on intelligence and too little on judgment or on anything else that is not associated with doing things that intelligent people do. I am not interested in writing a program that cleverly solves puzzles, figures out how to get to the airport with potatoes in the way, plays chess, makes medical diagnoses, speaks like Oliver Wendell Holmes, or otherwise is better at something than 95% of the population of the world. Of course things like inverting 500 by 500 matrices programs can do better than most people, but I certainly don't have any prejudice in favor of machines over people.
My point in writing this program is to make something of an introspectively accurate (to me) model for obtaining the sort of behavior in deliberate writing that I think occurs with normal human writers. I wanted to be able to at least express the things I thought were important and to be able to give advice to the program and have that advice acted upon, in ways that I think are similar to the way people take advice and act.
Catch phrases such as knowledge-based, intelligent, planning, and rule-based have little place in the fundamental process of understanding the mind, and to worry about creating idiot savants seems to be a misdirected goal to me. If AI is taking the idea that the mind is like a computer in some ways, and if the idea of what a computer is like must remain as it has for decades, and if the current antiquated ideas of computing must form the basis of understanding the mind, then I'm afraid I fall outside of the realm of AI for the moment, since I think that we must redefine computation and expand our horizons as failures and boundaries of our abilities are encountered.
Yh does a fair job of generating English about a small class of programs, but it is not a production level program. This small class of programs could be better explained with some ad hoc program for exactly that purpose. Yh does manage to demonstrate interesting behavior in exploring the courses of action available for expressing the program in question. It is possible to program Yh in ways that correspond more closely to way that people are given advice, which means that we can begin to converse with these programs - though not in English - in ways that are natural for both rather than in ways that are like a surgeon (the programmers) operating on a patient (the programs), which is the case at present.
Yh represents a methodological shift away from standard kinds of programming techniques and assumptions towards ones which provide an opportunity to interact with our programs more freely. I hope that this thesis will allow people to stop being obsessed with puzzles and correct solutions and to get on with the business of understanding ordinary, everyday minds, which is the first step in creating colleagues rather than slaves.