Publication: Backpropagation with Continuation Callbacks: Foundations for Efficient and Expressive Differentiable Programming
Introduction
Applications
Tools
Research Groups
Workshops
Publications
   List Publications
   Advanced Search
   Info
   Add Publications
My Account
About
Impress

Backpropagation with Continuation Callbacks: Foundations for Efficient and Expressive Differentiable Programming

- Part of a collection -
 

Author(s)
Fei Wang , James Decker , Xilun Wu , Grégory Essertel , Tiark Rompf

Published in
Advances in Neural Information Processing Systems 31: 32nd Conference on Neural Information Processing Systems (NeurIPS 2018)

Editor(s)
S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett

Year
2018

Publisher
Curran Associates, Inc.

Abstract
Training of deep learning models depends on gradient descent and end-to-end differentiation. Under the slogan of differentiable programming, there is an increasing demand for efficient automatic gradient computation for emerging network architectures that incorporate dynamic control flow, especially in NLP. In this paper we propose an implementation of backpropagation using functions with callbacks, where the forward pass is executed as a sequence of function calls, and the backward pass as a corresponding sequence of function returns. A key realization is that this technique of chaining callbacks is well known in the programming languages community as continuation-passing style (CPS). Any program can be converted to this form using standard techniques, and hence, any program can be mechanically converted to compute gradients. Our approach achieves the same flexibility as other reverse-mode automatic differentiation (ad) techniques, but it can be implemented without any auxiliary data structures besides the function call stack, and it can easily be combined with graph construction and native code generation techniques through forms of multi-stage programming, leading to a highly efficient implementation that combines the performance benefits of define-then-run software frameworks such as TensorFlow with the expressiveness of define-by-run frameworks such as PyTorch.

AD Theory and Techniques
Functional Programming

BibTeX
@INPROCEEDINGS{
         Wang2018BwC,
       author = "Wang, Fei and Decker, James and Wu, Xilun and Essertel, Gr{\'e}gory and
         Rompf, Tiark",
       booktitle = "Advances in Neural Information Processing Systems 31: 32nd Conference on Neural
         Information Processing Systems (NeurIPS 2018)",
       editor = "S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R.
         Garnett",
       pages = "10180--10191",
       publisher = "Curran Associates, Inc.",
       title = "Backpropagation with Continuation Callbacks: Foundations for Efficient and Expressive
         Differentiable Programming",
       url =
         "https://proceedings.neurips.cc/paper_files/paper/2018/file/34e157766f31db3d2099831d348a7933-Paper.pdf",
       year = "2018",
       abstract = "Training of deep learning models depends on gradient descent and end-to-end
         differentiation. Under the slogan of differentiable programming, there is an increasing demand for
         efficient automatic gradient computation for emerging network architectures that incorporate dynamic
         control flow, especially in NLP. In this paper we propose an implementation of backpropagation using
         functions with callbacks, where the forward pass is executed as a sequence of function calls, and
         the backward pass as a corresponding sequence of function returns. A key realization is that this
         technique of chaining callbacks is well known in the programming languages community as
         continuation-passing style (CPS). Any program can be converted to this form using standard
         techniques, and hence, any program can be mechanically converted to compute gradients. Our approach
         achieves the same flexibility as other reverse-mode automatic differentiation (AD) techniques, but
         it can be implemented without any auxiliary data structures besides the function call stack, and it
         can easily be combined with graph construction and native code generation techniques through forms
         of multi-stage programming, leading to a highly efficient implementation that combines the
         performance benefits of define-then-run software frameworks such as TensorFlow with the
         expressiveness of define-by-run frameworks such as PyTorch.",
       ad_theotech = "Functional Programming"
}


back
  

Contact:
autodiff.org
Username:
Password:
(lost password)