Building Network Learning Algorithms from Hebbian Synapses (2010)
N. M. Weinberger, Terrence J. Sejnowski, Gerald Tesauro
introduced several hypotheses about the neural substrate of learning and memory, including the Hebb learning rule, or Hebb synapse. We now have solid physiological evidence, verified in several...
The Hebb Rule for Synaptic Plasticity: Algorithms and (2010)
In Byrne, Terrence J. Sejnowski, Gerald Tesauro
which he introduced several hypotheses about the neural substrate of learning and memory, including the Hebb learning rule or Hebb synapse. At that time very little was known about neural mechanisms...
Monte-Carlo Simulation Balancing (2010)
In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an...
Autonomic Multi-Agent Management of Power and Performance in Data Centers (2009)
Rajarshi Das, Gerald Tesauro, Jeffrey O. Kephart, David W. Levine, Charles Lefurgy, Hoi Chan
The rapidly rising cost and environmental impact of energy consumption in data centers has become a multi-billion dollar concern globally. In response, the IT Industry is actively engaged in a...
Monte-Carlo Simulation Balancing (2009)
In this paper we introduce the first algorithms for efficiently learning a simulation policy for Monte-Carlo search. Our main idea is to optimise the balance of a simulation policy, so that an...
Autonomic Multi-Agent Management of Power and Performance in Data Centers (2009)
Rajarshi Das, Gerald Tesauro, Jeffrey O. Kephart, David W. Levine, Charles Lefurgy, Hoi Chan
The rapidly rising cost and environmental impact of energy consumption in data centers has become a multi-billion dollar concern globally. In response, the IT Industry is actively engaged in a...
Online Performance Management Using Hybrid Reinforcement Learning (2008)
We present a new hybrid approach to performance management, combining disparate strengths of Reinforcment Learning (RL) with model-based (e.g. queuing-theoretic) approaches. Our method trains...
Multi-Agent Implementation of Asymmetric Protocol for Bilateral Negotiations (2008)
James E. Hanson, Gerald Tesauro, Jeffrey O. Kephart, Edward C. Snible
We present a practical implementation of a FIPA-compliant multiagent system, written in Java using the ABLE agent platform, in which the agents can negotiate the exchange of multi-attribute goods in...
Jeffrey O. Kephart, Hoi Chan, Rajarshi Das, David W. Levine, Gerald Tesauro, Freeman Rawson, ...
multiple autonomic managers to achieve specified
problems: A performance comparison of different learning algorithms. Technical (2007)
[ Werbos, 1974] P. Werbos. Beyond regression: New tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University, 1974.
Multi-agent Q-learning and regression trees for automated pricing decisions (2007)
Manu Sridharan, Gerald Tesauro
We study the use of single-agent and multiagent Q-learning to learn seller pricing strategies in three different two-seller models of agent economies, using a simple regression tree approximation...
A hybrid reinforcement learning approach to autonomic resource allocation (2006)
Gerald Tesauro, Nicholas K. Jong, Rajarshi Das, Mohamed N. Bennani
Abstract — Reinforcement Learning (RL) provides a promising new approach to systems performance management that differs radically from standard queuing-theoretic approaches making use of explicit...
New approaches to optimization and utility elicitation in autonomic computing (2005)
Relu Patrascu, Craig Boutilier, Rajarshi Das, Jeffrey O. Kephart, Gerald Tesauro, William E. Walsh
Autonomic (self-managing) computing systems face the critical problem of resource allocation to different computing elements. Adopting a recent model, we view the problem of provisioning resources as...
Online Resource Allocation Using Decompositional Reinforcement Learning (2005)
been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be...
Extending Q-learning to general adaptive multi-agent systems (2004)
Recent multi-agent extensions of Q-Learning require knowledge of other agents ’ payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a...
Utility functions in autonomic systems (2004)
William E. Walsh, Gerald Tesauro, Jeffrey O. Kephart, Rajarshi Das
Utility functions provide a natural and advantageous framework for achieving self-optimization in distributed autonomic computing systems. We present a distributed architecture, implemented in a...
Extending Q-Learning to General Adaptive (2003)
Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a...
Strategic sequential bidding in auctions using dynamic programming (2002)
Gerald Tesauro, Jonathan L. Bredin
We develop a general framework in which real-time Dynamic Programming (DP) can be used to formulate agent bidding strategies in a broad class of auctions characterized by sequential bidding and...
Programming backgammon using self-teaching neural nets (2002)
TD-Gammon is a neural network that is able to teach itself to play backgammon solely by playing against itself and learning from the results. Starting from random initial play, TD-Gammon’s...
High-Performance Bidding Agents for the Continuous (2001)
Double Auction Gerald, Gerald Tesauro, Rajarshi Das
An increasingly important focus in agent-based electronic commerce is the design of robust heuristic bidding algorithms for a variety of auctions, including the Continuous Double Auction institution...
High-performance bidding agents for the continuous double auction (2001)
We develop two bidding algorithms for real-time Continuous Double Auctions (CDAs) using a variety of market rules that o#er what we believe to be the strongest known performance of any published...
High-performance bidding agents for the continuous double auction (2001)
We develop two bidding algorithms for real-time Continuous Double Auctions (CDAs) using a variety of market rules that offer what we believe to be the strongest known performance of any published...
Agent-Human Interactions in the Continuous Double Auction (2001)
Rajarshi Das James, James E. Hanson, Jeffrey O. Kephart, Gerald Tesauro
The Continuous Double Auction (CDA) is the dominant market institution for real-world trading of equities, commodities, derivatives, etc. We describe a series of laboratory experiments that, for the...
Agent-human interactions in the continuous double auction (2001)
Rajarshi Das, James E. Hanson, Jeffrey O. Kephart, Gerald Tesauro
The Continuous Double Auction (CDA) is the dominant market institution for real-world trading of equities, commodities, derivatives, etc. We describe a series of laboratory experiments that, for the...
Pricing in agent economies using multi-agent q-learning (1999)
Gerald Tesauro, Jeffrey O. Kephart
Abstract. This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive...
Pricing in agent economies using neural networks and multi-agent q-learning (1999)
This paper investigates how adaptive software agents may utilize reinforcement learning algorithms such as Q-learning to make economic decisions such as setting prices in a competitive marketplace....
Using a Neural Net to Instantiate a Deformable Model (1995)
Gerald Tesauro, David Touretzky, Todd Leen, Morgan Kaufmann, Michael D. Revow, ...
Deformable models are an attractive approach to recognizing nonrigid objects which have considerable within class variability. However, there are severe search problems associated with fitting the...
Practical issues in temporal difference learning (1992)
Abstract. This paper examines whether temporal difference methods for training connectionist networks, such as Suttons’s TD(λ) algorithm, can be successfully applied to complex real-world...
Practical issues in temporal difference learning (1992)
Abstract. This paper examines whether temporal difference methods for training connectionist networks, such as Sutton's TD(X) algorithm, can be successfully applied to complex real-world...