| A Simulation-Based Algorithm for Ergodic Control of Markov Chains Conditioned on Rare Events (2006) | |||||||||||||||||
Abstract | |||||||||||||||||
| We study the problem of long-run average cost control of Markov chains conditioned on a rare event. In a related recent work, a simulation based algorithm for estimating performance measures associated with a Markov chain conditioned on a rare event has been developed. We extend ideas from this work and develop an adaptive algorithm for obtaining, online, optimal control policies conditioned on a rare event. Our algorithm uses three timescales or step-size schedules. On the slowest timescale, a gradient search algorithm for policy updates that is based on one-simulation simultaneous perturbation stochastic approximation (SPSA) type estimates is used. Deterministic perturbation sequences obtained from appropriate normalized Hadamard matrices are used here. | |||||||||||||||||
Publication details | |||||||||||||||||
| |||||||||||||||||