| Why experimentation can be better than perfect guidance (1997) | |||||||||||||||||
Abstract | |||||||||||||||||
| The full version of this paper appeared at ICML-97. Many problems correspond to the classical control task of determining the appropriate control action to take, given some (sequence of) observations. One standard approach to learning these control rules, called behavior cloning, involves watching a perfect operator operate a plant, and then trying to emulate its behavior. In the experimental learning approach, by contrast, the learner first guesses an initial operation-toaction policy and tries it out. If this policy performs sub-optimally, the learner can modify it to produce a new policy, and recur. This paper discusses the relative effectiveness of these two approaches, especially in the presence of perceptual aliasing, showing in particular that the experimental learner can often learn more effectively than the cloning one. 1 | |||||||||||||||||
Publication details | |||||||||||||||||
| |||||||||||||||||