www.Autodiff.org - Publication: BP(λ): Online Learning via Synthetic Gradients

BP(λ): Online Learning via Synthetic Gradients

- Article in a journal -

Area
Machine Learning

Author(s)
Joseph Oliver Pemberton , Rui Ponte Costa

Published in
Transactions on Machine Learning Research

Year
2024

Abstract
Training recurrent neural networks typically relies on backpropagation through time (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to these computations before loss gradients are available. Recently, Jaderberg et al. proposed synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients, analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in TD learning, heavy use of bootstrapping can result in bias which leads to poor synthetic gradient estimates. Inspired by the accumulate TD(λ) in RL, we propose a fully online method for learning synthetic gradients which avoids the use of BPTT altogether: accumulate BP(λ). As in accumulate TD(λ), we show analytically that accumulate BP(λ) can control the level of bias by using a mixture of temporal difference errors and recursively defined eligibility traces. We next demonstrate empirically that our model outperforms the original implementation for learning synthetic gradients in a variety of tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work we reflect on accumulate BP(λ) as a principle for learning in biological circuits. In summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning via synthetic gradients.

BibTeX
@ARTICLE{
         Pemberton2024BOL,
       title = "{BP}($\lambda$): Online Learning via Synthetic Gradients",
       author = "Joseph Oliver Pemberton and Rui Ponte Costa",
       journal = "Transactions on Machine Learning Research",
       issn = "2835-8856",
       year = "2024",
       url = "https://openreview.net/forum?id=3kYgouAfqk",
       abstract = "Training recurrent neural networks typically relies on backpropagation through time
         (BPTT). BPTT depends on forward and backward passes to be completed, rendering the network locked to
         these computations before loss gradients are available. Recently, Jaderberg et al. proposed
         synthetic gradients to alleviate the need for full BPTT. In their implementation synthetic gradients
         are learned through a mixture of backpropagated gradients and bootstrapped synthetic gradients,
         analogous to the temporal difference (TD) algorithm in Reinforcement Learning (RL). However, as in
         TD learning, heavy use of bootstrapping can result in bias which leads to poor synthetic gradient
         estimates. Inspired by the accumulate TD($\lambda$) in RL, we propose a fully online method for
         learning synthetic gradients which avoids the use of BPTT altogether: \emph{accumulate
         BP($\lambda$)}. As in accumulate TD($\lambda$), we show analytically that
         accumulate~BP($\lambda$) can control the level of bias by using a mixture of temporal
         difference errors and recursively defined eligibility traces. We next demonstrate empirically that
         our model outperforms the original implementation for learning synthetic gradients in a variety of
         tasks, and is particularly suited for capturing longer timescales. Finally, building on recent work
         we reflect on accumulate~BP($\lambda$) as a principle for learning in biological circuits. In
         summary, inspired by RL principles we introduce an algorithm capable of bias-free online learning
         via synthetic gradients.",
       ad_area = "Machine Learning"
}

back

autodiff.org
Username:
Password: