Publication View

Learning Rate Schedules For Faster Stochastic Gradient Search (1992)

Abstract
. Stochastic gradient descent is a general algorithm that includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate j (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed class of "search then converge" (STC) learning rate schedules (Darken and Moody, 1990b, 1991) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima However, the user is responsible for setting a key parameter. We propose here a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence. INTRODUCTION The stochastic gradient descent algorithm is \DeltaW (t) = \Gammajr W f [W (t); X(t)]; (1) where j is the learning rate, t is the "time", and X(t) is the independent random exemplar chosen at time t. The purpose of the algorithm is to find a parameter vector W that minim...

Publication details
Download http://citeseerx.ist.psu.edu/viewdoc/summary?doi=?doi=10.1.1.42.2884
Source ftp://ftp.cis.ohio-state.edu/pub/neuroprose/darken.learning_rates.ps.Z
Publisher IEEE Press
Contributors CiteSeerX
Repository CiteSeerX - Scientific Literature Digital Library and Search Engine (United States)
Type text
Language English
Relation 10.1.1.32.9030, 10.1.1.30.5295, 10.1.1.1.6803, 10.1.1.32.5131, 10.1.1.42.664, 10.1.1.48.8580, 10.1.1.50.8063, 10.1.1.29.3290, 10.1.1.36.965, 10.1.1.84.5979, 10.1.1.120.9222, 10.1.1.56.5358, 10.1.1.31.6066, 10.1.1.128.346, 10.1.1.31.8016, 10.1.1.70.8499, 10.1.1.36.6418, 10.1.1.36.5065, 10.1.1.104.3870, 10.1.1.6.598, 10.1.1.72.8478, 10.1.1.77.6474, 10.1.1.95.3586, 10.1.1.99.5499, 10.1.1.115.9396, 10.1.1.46.7962, 10.1.1.32.9189