Publication View

A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events (2006)

Abstract
We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest timescale, a gradient search algorithm for policy updates that is based on one-simulation simultaneous perturbation stochastic approximation (SPSA) type estimates is used. Deterministic perturbation sequences obtained from appropriate normalized Hadamard matrices are used here.

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.6148
Source http://www.jmlr.org/papers/volume7/bhatnagar06a/bhatnagar06a.pdf
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Keywords Markov decision processes, optimal control conditioned on a rare event, simulation
Type text
Language English
Relation 10.1.1.32.7692, 10.1.1.19.4562, 10.1.1.21.8723, 10.1.1.15.2736, 10.1.1.26.9881, 10.1.1.128.3948, 10.1.1.21.8464, 10.1.1.51.6839, 10.1.1.24.9272, 10.1.1.35.4255, 10.1.1.28.8447, 10.1.1.55.2940, 10.1.1.54.3451, 10.1.1.128.1084, 10.1.1.132.7563, 10.1.1.58.6444