Regularized Policy Iteration (2009)
Amir Massoud Farahmand, Csaba Szepesvári, Mohammad Ghavamzadeh, Shie Mannor
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric...
Regularized Fitted Q-iteration: Application to Planning (2009)
Amir Massoud Farahm, Mohammad Ghavamzadeh, Csaba Szepesvári, Shie Mannor
Abstract. We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration...
Finite-Time Bounds for Fitted Value Iteration (2009)
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs)...
Robust PCA for High-Dimensional Data (2009)
Huan Xu, Constantine Caramanis, Shie Mannor
Abstract — We consider the dimensionality-reduction problem for a contaminated data set in a very high dimensional space, i.e., the problem of finding a subspace approximation of observed data,...
Robust Regression and Lasso (2009)
Huan Xu, Constantine Caramanis, Shie Mannor
We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set...
Learning in the Limit with Adversarial Disturbances (2009)
Constantine Caramanis, Shie Mannor
We study distribution-dependent, data-dependent, learning in the limit with adversarial disturbance. We consider an optimization-based approach to learning binary classifiers from data under...
Sparse Algorithms are not Stable: A No-free-lunch Theorem (2009)
Huan Xu, Shie Mannor, Constantine Caramanis
Abstract — We consider two widely used notions in machine learning, namely: sparsity and algorithmic stability. Both notions are deemed desirable in designing algorithms, and are believed to lead...
Huan Xu, Shie Mannor, Constantine Caramanis
We consider two new formulations for classification problems in the spirit of support vector machines based on robust optimization. Our formulations are designed to build in protection to noise and...
Online Learning with Sample Path Constraints (2009)
Shie Mannor, John N. Tsitsiklis, Jia Yuan Yu
We study online learning when the objective of the decision maker is to maximize her long-term average reward subject to certain sample path average constraints. We define the reward-in-hindsight as...
Regularized Policy Iteration (2009)
Farahmand, Amir Massoud, Ghavamzadeh, Mohammad, Szepesvari, Csaba, Mannor, Shie
In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric...
Robust Regression and Lasso (2008)
Xu, Huan, Caramanis, Constantine, Mannor, Shie
Lasso, or $\ell^1$ regularized least squares, has been explored extensively for its remarkable sparsity properties. It is shown in this paper that the solution to Lasso, in addition to its sparsity,...
Bayesian Reinforcement Learning with Gaussian Process Temporal Difference Methods (2008)
Engel, Yaki, Mannor, Shie, Meir, Ron
Reinforcement Learning is a class of problems frequently encountered by both biological and artificial agents. An important algorithmic component of many Reinforcement Learning solution methods is...
Reinforcement learning with kernels and Gaussian processes (2008)
Yaakov Engel, Shie Mannor, Ron Meir
Kernel methods have become popular in many sub-fields of machine learning, with the exception of reinforcement learning; they facilitate rich representations, and enable machine learning techniques...
extended abstract Calibrated Forecasts: Efficiency versus Universality (2008)
Gürdal Arslan, Shie Mannor, Jeff S. Shamma
One approach to learning/adaptation in repeated matrix games is to have each player compute some sort of forecast of opponent actions and play a best response to this forecast. Accordingly, the...
An Inequality for Nearly Log-Concave Distributions With Applications to Learning (2008)
Constantine Caramanis, Shie Mannor
Abstract—We prove that given a nearly log-concave distribution, in any partition of the space to two well separated sets, the measure of the points that do not belong to these sets is large. We...
Strategies for prediction under imperfect monitoring (2008)
Gábor Lugosi, Shie Mannor, Gilles Stoltz
3 CNRS and Département de mathématiques et applications, Ecole normale
Reinforcement Learning-based Load Shared Sequential Routing (2008)
Fariba Heidari, Shie Mannor, Lorne G. Mason
Abstract. We consider event dependent routing algorithms for on-line explicit source routing in MPLS networks. The proposed methods are based on load shared sequential routing in which load sharing...
Robustness and Regularization of Support Vector Machines (2008)
Xu, Huan, Caramanis, Constantine, Mannor, Shie
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and...
Shie Mannor, Duncan Simester, Peng Sun, John N. Tsitsiklis
informs ® doi 10.1287/mnsc.1060.0614
The Robustness-Performance Tradeoff in Markov Decision Processes (2008)
Computation of a satisfactory control policy for a Markov decision process when the parameters of the model are not exactly known is a problem encountered in many practical applications. The...
Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from...
Strategies for prediction under imperfect monitoring (2008)
Gábor Lugosi, Shie Mannor, Gilles Stoltz
3 CNRS and Département de mathématiques et applications, Ecole normale
Strategies for prediction under imperfect monitoring (2008)
Gábor Lugosi, Shie Mannor, Gilles Stoltz
We propose simple randomized strategies for sequential decision (or prediction) under imperfect monitoring, that is, when the decision maker (forecaster) does not have access to the past outcomes but...
Approachability in Repeated Games: Computational Aspects and a Stackelberg Variant 1 (2008)
Shie Mannor, John N. Tsitsiklis
We consider a finite two-player zero-sum game with vector-valued rewards. We study the question of whether a given polyhedral set D is “approachable, ” that is, whether Player 1 (the “decision...
Strategies for prediction under imperfect monitoring (2008)
Lugosi, Gabor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring (2008)
Lugosi, Gabor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring. (2008)
Lugosi, Gábor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential decision (or prediction) under imperfect monitoring, that is, when the decision maker (forecaster) does not have access to the past outcomes but...
Error Reducing Sampling in Reinforcement Learning (2008)
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing...
Error Reducing Sampling in Reinforcement Learning (2008)
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing...
Regularized Fitted Q-iteration: Application to Planning (2008)
Farahmand, Amir Massoud, Ghavamzadeh, Mohammad, Mannor, Shie, Szepesvari, Csaba
We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with...
Large Deviations Bounds for Markov Decision Processes Under General Policies (2007)
Shie Mannor, John N. Tsitsiklis
We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and...
The Empirical Bayes Envelope and Regret Minimization in Stochastic Games (2007)
Abstract. In repeated matrix games, classical results establish the existence of regret minimizing strategies that secure an average payo as high as the best-response payo (the Bayes Envelope)...
On Universal Compression of Multi-Dimensional Data Arrays Using Self-Similar Curves (2007)
The LZ-based scheme for the lossless compression of multi-dimensional data arrays using self-similar curves is revisited. The asymptotic universality of this compression scheme is established for the...
Shie Mannor, Ron Meir, Shahar Mendelson
Boosting algorithms have been shown to perform well on many realworld problems, although they sometimes tend to overfit in noisy situations. While excellent finite sample bounds are known, it has not...
Reinforcement Learning for Average Reward Zero-Sum Games (2007)
We consider Reinforcement Learning for average reward zerosum stochastic games. We present and analyze two algorithms. The first is based on relative Q-learning and the second on Q-learning for...
Error reducing sampling in reinforcement learning (2007)
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing...
Strategies for prediction under imperfect monitoring (2007)
Lugosi, Gabor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring (2007)
Lugosi, Gábor, Stoltz, Gilles, Mannor, Shie
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring (2007)
Lugosi, Gabor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring (2007)
Lugosi, Gabor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring (2007)
Lugosi, Gabor, Mannor, Shie, Stoltz, Gilles
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Strategies for prediction under imperfect monitoring (2007)
Lugosi, Gábor, Stoltz, Gilles, Mannor, Shie
We propose simple randomized strategies for sequential prediction under imperfect monitoring, that is, when the forecaster does not have access to the past outcomes but rather to a feedback signal....
Network Formation: Bilateral Contracting and Myopic Dynamics (2007)
Esteban Arcaute, Ramesh Johari, Shie Mannor
We consider a network formation game where nodes wish to send traffic to each other. Nodes contract bilaterally with each other to form bidirectional communication links; once the network is formed,...
Multi-agent learning for engineers (2007)
As suggested by the title of Shoham, Powers, and Grenager’s position paper [34], the ultimate lens through which the multi-agent learning framework should be assessed is “what is the...
Adaptive timeout policies for fast finegrained power management (2007)
Power management techniques for mobile appliances put the components of the systems into low power states to maximize battery life while minimizing the impact on the perceived performance of the...
The cross-entropy method for power system combinatorial optimization problems (2007)
Ernst, Damien, Glavic, Mevludin, Stan, Guy-Bart, Mannor, Shie, Wehenkel, Louis
2007 Power Tech
Approachability in Repeated Games: Computational Aspects and a Stackelberg Variant 1 (2007)
Shie Mannor, John N. Tsitsiklis
We consider a finite two-player zero-sum game with vector-valued rewards. We study the question of whether a given polyhedral set D is “approachable, ” that is, whether Player 1 (the “decision...
A contract-based model for directed network formation (2006)
Ramesh Johari, Shie Mannor, John N. Tsitsiklis
We consider a network game where the nodes of the network wish to form a graph to route traffic between themselves. We present a model where costs are incurred for routing traffic, as well as for a...
We study a variance reduction technique for Monte Carlo estimation of functionals in Markov chains. The method is based on designing sequential control variates using successive approximations of the...
A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events (2006)
Shalabh Bhatnagar, Vivek S. Borkar, Madhukar Akarapu, Shie Mannor
We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures...
Eyal Even-dar, Shie Mannor, Yishay Mansour, Sridhar Mahadevan
We incorporate statistical confidence intervals in both the multi-armed bandit and the reinforcement learning problems. In the bandit problem we show that given n arms, it suffices to pull the arms a...
Evolutionary function approximation for reinforcement learning (2006)
Shimon Whiteson, Peter Stone, Sven Koenig, Shie Mannor, Georgios Theocharous
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods...
We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov Decision Process (MDP). Our work builds on results by Bertsekas and...
Evolutionary function approximation for reinforcement learning (2006)
Shimon Whiteson, Peter Stone, Sven Koenig, Shie Mannor, Georgios Theocharous
Temporal difference methods are theoretically grounded and empirically effective methods for addressing reinforcement learning problems. In most real-world reinforcement learning tasks, TD methods...
Asymptotics of Efficiency Loss in Competitive Market Mechanisms (2006)
• Problem: allocation of bandwidth in a network. – Feature: heterogeneous demand.
Reinforcement learning with Gaussian processes (2005)
Engel, Yaakov, Mannor, Shie, Meir, Ron
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing...
Efficiency Loss in a Network Resource Allocation Game: The Case of Elastic Supply (2005)
Johari, Ramesh, Mannor, Shie, Tsitsiklis, John N.
We consider a resource allocation problem where individual users wish to send data across a network to maximize their utility, and a cost is incurred at each link that depends on the total rate sent...
Basis function adaptation in temporal difference reinforcement learning (2005)
Ishai Menache, Shie Mannor, Nahum Shimkin
Reinforcement Learning (RL) is an approach for solving complex multistage decision problems that fall under the general framework of Markov Decision Problems (MDPs), with possibly unknown parameters....
Basis function adaptation in temporal difference reinforcement learning (2005)
Ishai Menache, Shie Mannor, Nahum Shimkin
∗ All correspondence should be sent to this author. 1 We examine methods for on-line optimization of the basis function for temporal difference Reinforcement Learning algorithms. We concentrate on...
Reinforcement learning with Gaussian processes (2005)
Yaakov Engel, Shie Mannor, Ron Meir
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing...
Reinforcement learning with Gaussian processes (2005)
Yaakov Engel, Shie Mannor, Ron Meir
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing...
Reinforcement learning with Gaussian processes (2005)
Yaakov Engel, Shie Mannor, Ron Meir
Gaussian Process Temporal Difference (GPTD) learning offers a Bayesian solution to the policy evaluation problem of reinforcement learning. In this paper we extend the GPTD framework by addressing...
Approachability in Repeated Games: Computational Aspects and a Stackelberg Variant 1 (2005)
Shie Mannor, John N. Tsitsiklis
We consider a finite two-player zero-sum game with vector-valued rewards. We study the question of whether a given polyhedral set D is “approachable, ” that is, whether Player 1 (the “decision...
A Tutorial on the Cross-Entropy Method (2004)
De Boer, Pieter-Tjerk, Kroese, D. P., Mannor, Shie, Rubinstein, R. Y.
The cross-entropy approach is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. The purpose of this tutorial is to give a gentle introduction to the...
A Tutorial on the Cross-Entropy Method (2004)
De Boer, Pieter-Tjerk, Kroese, D. P., Mannor, Shie, Rubinstein, R. Y.
The cross-entropy approach is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. The purpose of this tutorial is to give a gentle introduction to the...
The Kernel recursive least squares algorithm (2004)
Engel, Yaakov, Mannor, Shie, Meir, Ron
We present a non-linear kernel-based version of the Recursive Least Squares (RLS) algorithm. Our Kernel-RLS (KRLS) algorithm performs linear regression in the feature space induced by a Mercer...
Error reducing sampling in reinforcement learning (2004)
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing...
Error reducing sampling in reinforcement learning (2004)
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing...
Error reducing sampling in reinforcement learning (2004)
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing...
A Tutorial on the Cross-Entropy Method (2004)
De Boer, Pieter-Tjerk, Kroese, D. P., Mannor, Shie, Rubinstein, R. Y.
The cross-entropy approach is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. The purpose of this tutorial is to give a gentle introduction to the...
A Tutorial on the Cross-Entropy Method (2004)
De Boer, Pieter-Tjerk, Kroese, D. P., Mannor, Shie, Rubinstein, R. Y.
The cross-entropy approach is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. The purpose of this tutorial is to give a gentle introduction to the...
The sample complexity of exploration in the multi-armed bandit problem (2004)
Shie Mannor, John N. Tsitsiklis, Kristin Bennett, Nicolò Cesa-bianchi
We consider the multi-armed bandit problem under the PAC (“probably approximately correct”) model. It was shown by Even-Dar et al. (2002) that given n arms, a total of O � (n/ε 2)log(1/δ) �...
Dynamic abstraction in reinforcement learning via clustering (2004)
Shie Mannor, Ishai Menache, Amit Hoze, Uri Klein
We consider a graph theoretic approach for automatic construction of options in a dynamic environment. A map of the environment is generated on-line by the learning agent, representing the...
Kernel recursive least squares (2004)
Yaakov Engel, Shie Mannor, Ron Meir
We present a non-linear kernel-based version of the Recursive Least Squares (RLS) algorithm. Our Kernel-RLS algorithm performs linear regression in the feature space induced by a Mercer kernel, and...
Bias and variance in value function estimation (2004)
Shie Mannor, Peng Sun, John N. Tsitsiklis
We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a...
The Kernel Recursive Least Squares Algorithm (2004)
Yaakov Engel, Shie Mannor, Ron Meir
Copyright c ○ 2003 by the authors. This report may be used, copied and distributed in its entirety for academic purposes. Any copies must include this cover page and notice. Any non-academic use...
A geometric approach to multi-criterion reinforcement learning (2004)
We consider the problem of reinforcement learning in a dynamic environment, where the learning objective is dened in terms of multiple reward functions of the average reward type. The environment is...
Efficiency Loss in a Network Resource Allocation Game: The Case of Elastic Supply (2004)
Ramesh Johari, Shie Mannor, John N. Tsitsiklis
We consider a resource allocation problem where individual users wish to send data across a network to maximize their utility, and a cost is incurred at each link that depends on the total rate sent...
Efficiency Loss in a Network Resource Allocation Game: The Case of Elastic Supply (2004)
Ramesh Johari, Shie Mannor, John N. Tsitsiklis
We consider a resource allocation problem where individual users wish to send data across a network to maximize their utility, and a cost is incurred at each link that depends on the total rate sent...
An Inequality for Nearly Log-concave Distributions with Applications to Learning (2004)
Constantine Caramanis, Shie Mannor
We prove that given a nearly log-concave density, in any partition of the space to two well separated sets, the measure of the points that do not belong to these sets is large. We apply this...
On the Empirical State-Action Frequencies in Markov Decision Processes Under General Policies (2004)
Shie Mannor, John N. Tsitsiklis
We consider the empirical state-action frequencies and the empirical reward in weakly communicating finite-state Markov decision processes under general policies. We define a certain polytope and...
Efficiency loss in a network resource allocation game (2004)
Ramesh Johari, Shie Mannor, John N. Tsitsiklis
Abstract—We consider a resource allocation problem where individual users wish to send data across a network to maximize their utility, and a cost is incurred at each link that depends on the total...
A tutorial on the cross-entropy method (2004)
Dirk P. Kroese, Shie Mannor, Reuven Y. Rubinstein
The cross-entropy (CE) method is a new generic approach to combinatorial and multi-extremal optimization and rare event simulation. The purpose of this tutorial is to give a gentle introduction to...
A Geometric Approach to Multi-Criterion Reinforcement Learning (2004)
Shie Mannor, Nahum Shimkin, Sridhar Mahadevan
We consider the problem of reinforcement learning in a controlled Markov environment with multiple objective functions of the long-term average reward type. The environment is initially unknown, and...
Bayes meets Bellman: The Gaussian process approach to temporal difference learning (2003)
Yaakov Engel, Shie Mannor, Ron Meir
We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We dene a probabilistic generative model for the value function by imposing a Gaussian...
Greedy algorithms for classification - consistency, convergence rates, and adaptivity (2003)
Shie Mannor, Ron Meir, Tong Zhang
Many regression and classification algorithms proposed over the years can described as greedy procedures for the stagewise minimization of an appropriate cost function. Some examples include additive...
Action elimination and stopping conditions for reinforcement learning (2003)
Eyal Even-dar, Shie Mannor, Yishay Mansour
We consider incorporating action elimination procedures in reinforcement learning algorithms. We suggest a framework that is based on learning an upper and a lower estimates of the value function or...
The cross entropy method for fast policy search (2003)
Shie Mannor, Reuven Rubinstein, Yohai Gat
We present a learning framework for Markovian decision processes that is based on optimization in the policy space. Instead of using relatively slow gradient-based optimization algorithms, we use the...
The empirical Bayes envelope and regret minimization in competitive Markov decision processes (2003)
Abstract. This paper proposes an extension of the regret minimizing framework from re-peated matrix games to stochastic game models, under appropriate recurrence conditions. A decision maker (P1) who...
A Contract-Based Model for Directed Network Formation (2003)
Ramesh Johari, Shie Mannor, John N. Tsitsiklis
We consider a network game where the nodes of the network wish to form a graph to route traffic between themselves. We present a model where costs are incurred for routing traffic, as well as for a...
Greedy Algorithms for Classification - Consistency, Convergence Rates, and Adaptivity (2003)
Shie Mannor, Ron Meir, Tong Zhang, Yoram Singer
Many regression and classification algorithms proposed over the years can be described as greedy procedures for the stagewise minimization of an appropriate cost function. Some examples include...
Action elimination and stopping conditions for reinforcement learning (2003)
Eyal Even-dar, Shie Mannor, Yishay Mansour
We consider incorporating action elimination procedures in reinforcement learning algorithms. We suggest a framework that is based on learning an upper and a lower estimates of the value function or...
Shie Mannor, John N. Tsitsiklis
doi 10.1287/moor.1050.0148
Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning (2003)
Yaakov Engel, Shie Mannor, Ron Meir
We present a novel Bayesian approach to the problem of value function estimation in continuous state spaces. We define a probabilistic generative model for the value function by imposing a Gaussian...
Lower Bounds on the Sample Complexity of Exploration in the Multi-Armed Bandit Problem (2003)
Shie Mannor, John N. Tsitsiklis
We consider the Multi-armed bandit problem under the PAC ("probably approximately correct") model. It was shown by Even-Dar et al. [5] that given n arms, it su#ces to play the arms a total...
On-line Learning with Imperfect Monitoring (2003)
We study on-line play of repeated matrix games in which the observations of past actions of the other player and the obtained reward are partial and stochastic. We define the Partial Observation...
Q-Cut - dynamic discovery of sub-goals in reinforcement learning (2002)
Ishai Menache, Shie Mannor, Nahum Shimkin
Abstract. We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The...
PAC bounds for multi-armed bandit and Markov decision processes (2002)
Eyal Even-dar, Shie Mannor, Yishay Mansour
Abstract. The bandit problem is revisited and considered under the PAC model. Our main contribution in this part is to show that given n arms, it suffices to pull the arms O ( n ɛ2 log 1) times to...
Sparse online greedy support vector regression (2002)
Yaakov Engel, Shie Mannor, Ron Meir
Abstract. We present a novel algorithm for sparse online greedy kernelbased nonlinear regression. This algorithm improves current approaches to kernel-based regression in two aspects. First, it...
PAC bounds for multi-armed bandit and Markov decision processes (2002)
Eyal Even-dar, Shie Mannor, Yishay Mansour
1 Introduction The Multi-armed bandit problem is one of the classical problems in decisiontheory. The problem is very simple to model- there are a number of alternative arms, each with a stochastic...
Sparse online greedy support vector regression (2002)
Yaakov Engel, Shie Mannor, Ron Meir
Abstract. We present a novel algorithm for sparse online greedy kernelbased nonlinear regression. This algorithm improves current approaches to kernel-based regression in two aspects. First, it...
The consistency of greedy algorithms for classification (2002)
Shie Mannor, Ron Meir, Tong Zhang
Abstract. We consider a class of algorithms for classification, which are based on sequential greedy minimization of a convex upper bound on the 0- 1 loss function. A large class of recently popular...
The consistency of greedy algorithms for classification (2002)
Shie Mannor, Ron Meir, Tong Zhang
Abstract. We consider a class of algorithms for classification, which are based on sequential greedy minimization of a convex upper bound on the 0- 1 loss function. A large class of recently popular...
The Steering Approach for Multi-Criteria Reinforcement (2002)
We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to...
The Steering Approach for Multi-Criteria Reinforcement (2002)
We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to...
Sparse Online Greedy Support Vector Regression (2002)
Yaakov Engel, Shie Mannor, Ron Meir
We present a novel algorithm for sparse online greedy kernelbased nonlinear regression. This algorithm improves current approaches to kernel-based regression in two aspects. First, it operates online...
On the Existence of Linear Weak Learners and Applications to Boosting (2002)
Shie Mannor, Ron Meir, Yoshua Bengio, Dale Schuurmans
We consider the existence of a linear weak learner for boosting algorithms. A weak learner for binary classification problems is required to achieve a weighted empirical error on the training set...
Sparse Online Greedy Support Vector Regression (2002)
Yaakov Engel, Shie Mannor, Ron Meir
We present a novel algorithm for sparse online greedy kernelbased nonlinear regression. This algorithm improves current approaches to kernel-based regression in two aspects. First, it operates online...
The Steering Approach for Multi-Criteria Reinforcement (2002)
We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to...
Q-Cut - Dynamic Discovery of Sub-Goals in Reinforcement Learning (2002)
Ishai Menache, Shie Mannor, Nahum Shimkin
We present the Q-Cut algorithm, a graph theoretic approach for automatic detection of sub-goals in a dynamic environment, which is used for acceleration of the Q-Learning algorithm. The learning...
Adaptive strategies and regret minimization in arbitrarily varying Markov environments (2001)
Abstract. We consider the problem of maximizing the average reward in a controlled Markov environment, which also contains some arbitrarily varying elements. This problem is captured by a two-person...
Geometric bounds for generalization in boosting (2001)
Abstract. We consider geometric conditions on a labeled data set which guarantee that boosting algorithms work well when linear classiers are used as weak learners. We start by providing conditions...
Weak learners and improved rates of convergence in boosting (2001)
The problem constructing weak classifiers for boosting algorithms is studied. We present an algorithm that produces a linear classifier that is guaranteed to achieve an error better than random...
On Finding Good State Aggregation Functions (2001)
We describe a novel algorithm that learns to
Learning embedded maps of Markov processes (2001)
We present HEMP- a novel learning algorithm designed to operate in the domain of Markov processes and Markov decision processes. HEMP learns to perform a Heuristic Embedding of Markov Processes into...
Weak learners and improved rates of convergence in boosting (2001)
The problem of constructing weak classiers for boosting algorithms is studied. We present an algorithm that produces a linear classi er that is guaranteed to achieve an error better than random...
Xilinx Inc.: XPower Tutorial: FPGA Design, XPower (2001)
Shie Mannor, John N. Tsitsiklis
Abstract. We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We...
Geometric bounds for generalization in boosting (2001)
Abstract. We consider geometric conditions on a labeled data set which guarantee that boosting algorithms work well when linear classifiers are used as weak learners. We start by providing conditions...
Xilinx Inc.: XPower Tutorial: FPGA Design, XPower (2001)
Shie Mannor, John N. Tsitsiklis
Abstract. We study online learning where the objective of the decision maker is to maximize her average long-term reward given that some average constraints are satisfied along the sample path. We...
Regret minimization in signal space for repeated matrix games with partial observations (2000)
Abstract. The Bayes Envelope of a repeated matrix game traces the maximal payo rate that a player (say P1) could secure for herself had she known in advance the empirical frequency of P2's...
Regret minimization in signal space for repeated matrix games with partial observations (2000)
Abstract. The Bayes Envelope of a repeated matrix game traces the maximal payoff rate that a player (say P1) could secure for herself had she known in advance the empirical frequency of P2's...
Generalized approachability results for stochastic games with a single communicating state (2000)
Abstract. The theory of approachability concerns repeated games with vector-valued reward. A set in the reward space is called approachable if the decision maker can ensure that his long term average...
Regret minimization in repeated matrix games with variable stage duration
Regret minimization in repeated matrix games has been extensively studied ever since Hannan's seminal paper [Hannan, J., 1957. Approximation to Bayes risk in repeated play. In: Dresher, M., Tucker,...
Approachability in repeated games: Computational aspects and a Stackelberg variant
Mannor, Shie, Tsitsiklis, John N.
We consider a finite two-player zero-sum game with vector-valued rewards. We study the question of whether a given polyhedral set D is "approachable," that is, whether Player 1 (the "decision maker")...