An earlier, but still highly useful, discussion of related work is overviewed in:
A popular survey of the material covered in What is Thought? (with some new material) may be found here.
We address the problem of how to reinforcement learn in ultra-complex environments, with huge state spaces, where one must learn to exploit compact structure of the problem domain. The approach we propose is to simulate the evolution of an artificial economy of computer programs. The economy is constructed based on two simple principles so as to assign credit to the individual programs for collaborating on problem solutions. We find empirically that, starting from programs that are random computer code, we are able to evolve systems that solve hard problems. In particular our economy has learned to solve almost all random Blocks World problems with goal stacks 200 blocks high. Competing methods solve such problems only up to goal stacks of at most 8 blocks. Our economy has also learned to unscramble about half a randomly scrambled Rubik's cube, and to solve several among a collection of commercially sold puzzles.
This version of February 2000 updates previous versions of this paper posted here since 1998. New results include the experiments with Rubik's cube and Rush Hour, survey of Blocks World results from other algorithms, and improvements in the performance of our system allowing it to learn to solve large Blocks World problems with no hand-crafted features.
This paper describes an economic model called Hayek4 similar to the Hayek3 model described in "Evolution of Cooperative Problem-Solving in an Artificial Economy" directly above, but using Post Production Systems as the agent language rather than S-expressions. The Post Production System language is computationally universal, while the S-expression language was not. Accordingly Hayek4 succeeds in learning from random examples to solve arbitrary block stacking problems. The learned program essentially consists of about 5 learned rules and some learned control information. Solution of an instance with n blocks in its goal stack requires the automatic chaining of the rules in correct sequence about 2n deep.
Our aim is to produce a focused crawler that, given one or a number of sample pages, will crawl to all similar pages on the web as efficiently as possible. A key problem in achieving this is assigning credit to documents along a crawl path, so that the system can learn which documents lead toward goal documents and can thus efficiently search in a best first manner. To address this, we construct an artificial economy of autonomous agents. Agents buy and sell web pages and are compensated when another agent buys the current set of uncrawled web pages, when buying a goal page, or when future agents buy a goal page. The economy serves to push money up from goal pages, compensating agents that buy useful pages. Inappropriate agents go broke and new agents are created, and the system evolves agents whose bids accurately estimate the utility of adding pages to the search. The system is found to outperform a Bayesian focused crawler in our experiments.
We address the problem of reinforcement learning in ultra-complex environments. Such environments will require a modular approach. The modules must solve subproblems, and must collaborate on solution of the overall problem. However a collection of rational agents will only collaborate if appropriate structure is imposed. We give a result, analagous to the First Theorem of Welfare Economics, that shows how to impose such structure. That is, we describe how to use economic principles to assign credit and ensure that a collection of rational (but possibly computationally limited) agents will collaborate on reinforcement learning. Conversely, we survey several catastrophic failure modes that can be expected in distributed learning systems, and empirically have occurred in biological evolution, real economies, and artificial intelligence programs, when such a structure is not enforced.
We conjecture that simulated economies can evolve to reinforcement learn in complex environments in feasible time scales, starting from a collection of agents which have little knowledge and hence are not rational. We support this with two implementations of learning models based on these principles. The first of these systems has empirically learned to solve Blocks World problems involving arbitrary numbers of blocks. The second has demonstrated meta-learning-- it learns better ways of creating new agents, modifying its own learning algorithm to escape from local optima trapping competing approaches. We describe how economic models can naturally address problems at the meta-level, meta-learning and meta-computation, that are necessary for high intelligence; discuss the evolutionary origins and nature of biological intelligence; and compare, analyze, contrast, and report experiments on competing techniques including hillclimbing, genetic algorithms, genetic programming, and temporal difference learning.
Machine Learning 35, pp155-185 (1999). For reprint send request to
ebaum@fastmail.fm.
(
PostScript file 9 pages) Extended Abstract, April 4, 1996. In
Proc. 13th ICML '96, ed. L. Saitta, Morgan Kaufman, San Fran. CA.
A market-based algorithm is presented which autonomously apportions complex tasks to multiple cooperating agents giving each agent the motivation of improving performance of the whole system. A specific model, called ``The Hayek Machine'' is proposed and tested on a simulated Blocks World (BW) planning problem. Hayek learns to solve more complex BW problems than any previous learning algorithm. Given intermediate reward and simple features, it has learned to efficiently solve arbitrary BW problems. The Hayek Machine can also be seen as a model of evolutionary economics.
We analyze the performance of a Genetic Algorithm (GA) we call Culling and a variety of other algorithms on a problem we refer to as Additive Search Problem (ASP). ASP is closely related to several previously well studied problems, such as the game of Mastermind and additive fitness functions. We show that the problem of learning the Ising perceptron is reducible to a noisy version of ASP. Culling is near optimal for ASP, highly noise tolerant, and the best known approach in some regimes. Noisy ASP is the first problem we are aware of where a Genetic Type Algorithm bests all known competitors. Standard GA's, by contrast, perform much more poorly on ASP than hillclimbing and other approaches even though the Schema theorem holds for ASP. We generalize ASP to k-ASP to study whether GA's will achieve `implicit parallelism' in a problem with many more schemata. GA's fail to achieve this implicit parallelism, but we describe an algorithm we call Explicitly Parallel Search that succeeds. We also compute the optimal culling point for selective breeding, that turns out to be independent of the fitness function or the population distribution. We also analyze a Mean Field Theoretic algorithm performing similarly to Culling on many problems. These results provide an example of a rigorous analysis of GA's and give insight into when and how GA's can beat competing methods.
This is an expanded and improved version of ``On Genetic Algorithms" which appeared in COLT 95, Copyright(c)1995 by ACM, Inc.
The point of game tree search is to insulate oneself from errors in the evaluation function. The standard approach is to grow a full width tree as deep as time allows, and then value the tree as if the leaf evaluations were exact. This has been effective in many games because of the computational efficiency of the alpha-beta algorithm. But as Bayesians, we want to know the best way to use the inexact statistical information provided by the leaf evaluator to choose our next move. We add a model of uncertainty to the standard evaluation function. We describe how to value the tree within our model, and how to approximate efficiently the best tree to search. Our tree growth procedure provably approximates the contribution of each leaf to the probabilistically weighted sum of all ``conspiracies'' of leaf revaluations that might affect move choice and iteratively expands a set of the most important leaves. Our algorithms run (under reasonable assumptions) in time linear in the size of the final tree and hence except for a small constant factor, are as time efficient as alpha-beta. In a given amount of time our algorithm can thus in principle grow a tree several times as deep as alpha-beta along the lines judged most relevant, but at a cost of expanding sufficiently less along lines judged less relevant so that a small constant factor fewer nodes are searched in total. Our algorithm apportions a greater fraction of its thinking time to moves judged more relevant, and weighs the relevance of each leaf in choosing its move once it is finished searching.
We have tested our approach on a variety of games, including Othello, Kalah, Warri, and others. Our probability independence approximations are seen to be significantly violated, but nonetheless our tree valuation scheme was found to play significantly better than minimax or the Probability Product rule when both competitors search the same tree. Our full search algorithm was found to outplay a highly ranked, directly comparable alpha-beta Othello program even when the alpha-beta program was given sizeable time odds, and also performed well against the three top Othello programs on the Internet Othello Server.
In a previous paper, the authors considered game trees as graphical models. Adopting an evaluation function that returned a probability distribution over values likely to be taken at a given position, they described how to build a model of uncertainty and use it for utility directed growth of the search tree and for deciding on a move after search was completed. In some games, such as chess or Othello, the same position can occur more than once, collapsing the game tree to a Directed Acyclic Graph (DAG). This induces correlations among the distributions at sibling nodes. The present paper discusses some issues that arise in extending our algorithms to a DAG. We give a simply described algorithm for correctly propagating distributions up a game DAG, taking account of dependencies induced by the DAG structure. This algorithm is exponential time in the worst case. We prove that it is #P complete to correctly propagate distributions up a game DAG. We suggest how our exact propagation algorithm can yield a fast but inexact heuristic.
These are slides describing the very serious effort we made on computer Go 2000-2002. It ultimately failed because we couldn't find a semantically meaningful enough way to cut out the subgraphs, and the space was too large unless we had a meaningful way to cut them out. There are, however, some insights that may be valuable, so I'm posting it.
This is an opinion piece I wrote for Windows Magazine. It appeared in the June 1996 issue.
An algorithm is sketched for using DNA and the tools of molecular biology to form a huge associative memory. This paper appeared in Science, April 28, 1995. For a USnail reprint, send request to ebaum@fastmail.fm.
Recent proposals for DNA based computing (see e.g.[Adleman][Lipton][Baum]) encode Boolean vector component values with sequences of DNA. It has previously been assumed that sufficient length random subsequences could be used to encode component values. However use of such subsequences will inadvertently result in long complementary subsequences. Complementary subsequences of sufficient lengthwould stick to each other and cause mistakes or delays in computation. We suggest some constraints on DNA subsequences to be used in encodings, and describe maximal sets of subsequences satisfying these constraints.
We show that DNA computers are useful for running algorithms based on dynamic programming. This class of algorithm takes advantage of the large memory capacity of a DNA computer. We present algorithms for solving certain instances of the knapsack problem, using a dynamic programming approach. Unlike other algorithms for DNA computers, which are brute force, dybamic programming is the same algorithm one would use to solve (smaller) problems on a conventional computer.
These are transparencies from a talk given at the Organo Electronic Materials Conference. Unfortunately, the drawings are not included.