www.Autodiff.org - Surveys on Automatic Differentiation

Survey articles

Andreas Griewank: Achieving Logarithmic Growth of Temporal and Spatial Complexity in Reverse Automatic Differentiation 1992

Abstract In its basic form the reverse mode of automatic differentiation yields gradient vectors at a small multiple of the computational work needed to evaluate the underlying scalar function. The practical applicability of this temporal complexity result, due originally to Linnainmaa, seemed to be severely limited by the fact that the memory requirement of the basic implementation is proportional to the run time T, of the original evaluation program. It is shown here that, by a recursive scheme related to the multilevel differentiation approach of Volin and Ostrovskii, the growth in both temporal and spatial complexity can be limited to a fixed multiple of log(T). Other compromises between the run time and memory requirement are possible, so that the reverse mode becomes applicable to computational problems of virtually any size.

Louis B. Rall, George F. Corliss: An Introduction to Automatic Differentiation SIAM, 1996

Gives a short and gentle introduction to AD.

Abstract This paper provides a gentle introduction to the field of automatic differentiation (AD), with the goal of equipping the reader for the other papers in this book. AD is the systematic application of the familiar rules of calculus to computer programs, yielding programs for the propagation of numerical values of first, second, or higher derivatives. AD can be regarded as traversing the code list (or computational graph) in the forward mode, the reverse mode, or a combination of the two. Algorithms for numerical optimization, differential equations, and interval analysis all could use AD technology to compute the required derivatives. AD typically is implemented by using either source code transformation or operator overloading. We give examples of code for each. Finally, we outline some pitfalls of AD for naive users, and we present opportunities for future research.

Andreas Griewank: A Mathematical View of Automatic Differentiation Cambridge University Press, 2003

A more formal and in-depth introduction.

Abstract Automatic, or algorithmic, differentiation addresses the need for the accurate and efficient calculation of derivative values in scientific computing. To this end procedural programs for the evaluation of problem-specific functions are transformed into programs that also compute the required derivative values at the same numerical arguments in floating point arithmetic. Disregarding many important implementation issues, we examine in this article complexity bounds and other more mathematical aspects of the program transformation task sketched above.

Michael C. Bartholomew-Biggs, Steve Brown, Bruce Christianson, Laurence C. W. Dixon: Automatic Differentiation of Algorithms 2000

Abstract We introduce the basic notions of automatic differentiation, describe some extensions which are of interest in the context of nonlinear optimization and give some illustrative examples.

C. H. Bischof, H. M. Bücker, B. Lang: Automatic Differentiation for Computational Finance Kluwer Academic Publishers, 2002

Abstract Automatic differentiation (AD) is a powerful technique allowing to compute derivatives of a function given by a (potentially very large) piece of code. The basic principles of AD and some available tools implementing this technology are reviewed. AD is superior to divided differences because AD-generated derivative values are free of approximation errors, and superior to symbolic differentiation because code of very high complexity can be handled, in contrast to computer algebra systems whose applicability is limited to rather simple functions. In addition, the cost for computing gradients of scalar-valued functions with either divided differences or symbolic differentiation grows linearly with the number of variables, whereas the so-called reverse mode of AD can compute such gradients at constant cost.

C. H. Bischof, H. M. Bücker: Computing Derivatives of Computer Programs NIC-Directors, 2000

Abstract Automatic differentiation is introduced as a powerful technique to compute derivatives of functions given in the form of a computer program in a high-level programming language such as Fortran, C, or C++. In contrast to traditional approaches such as handcoding of analytic expressions, numerical approximation by divided differences, or manipulation of symbolic algebraic expressions by computer algebra systems, automatic differentiation offers the following substantial benefits: it is accurate up to machine precision, efficient in terms of computational cost, applicable to a 1-line formula as well as to a 100,000-line code, and can be produced with minimal human effort.

Assefaw Hadish Gebremedhin, Fredrik Manne, Alex Pothen: What Color Is Your Jacobian? Graph Coloring for Computing Derivatives SIAM, 2005

Gives a thorough overview on exploiting sparsity.

Abstract Graph coloring has been employed since the 1980s to efficiently compute sparse Jacobian and Hessian matrices using either finite differences or automatic differentiation. Several coloring problems occur in this context, depending on whether the matrix is a Jacobian or a Hessian, and on the specifics of the computational techniques employed. We consider eight variant vertex coloring problems here. This article begins with a gentle introduction to the problem of computing a sparse Jacobian, followed by an overview of the historical development of the research area. Then we present a unifying framework for the graph models of the variant matrix estimation problems. The framework is based upon the viewpoint that a partition of a matrix into structurally orthogonal groups of columns corresponds to distance-2 coloring an appropriate graph representation. The unified framework helps integrate earlier work and leads to fresh insights; enables the design of more efficient algorithms for many problems; leads to new algorithms for others; and eases the task of building graph models for new problems. We report computational results on two of the coloring problems to support our claims. Most of the methods for these problems treat a column or a row of a matrix as an atomic entity, and partition the columns or rows (or both). A brief review of methods that do not fit these criteria is provided. We also discuss results in discrete mathematics and theoretical computer science that intersect with the topics considered here.

M. Sambridge, P. Rickwood, N. Rawlinson, S. Sommacal: Automatic differentiation in geophysical inverse problems 2007

Gives an introduction from an application point of view.

Abstract Automatic differentiation (AD) is the technique whereby output variables of a computer code evaluating any complicated function (e.g. the solution to a differential equation) can be differentiated with respect to the input variables. Often AD tools take the form of source to source translators and produce computer code without the need for deriving and hand coding of explicit mathematical formulae by the user. The power of AD lies in the fact that it combines the generality of finite difference techniques and the accuracy and efficiency of analytical derivatives, while at the same time eliminating human coding errors. It also provides the possibility of accurate, efficient derivative calculation from complex forward codes where no analytical derivatives are possible and finite difference techniques are too cumbersome. AD is already having a major impact in areas such as optimization, meteorology and oceanography. Similarly it has considerable potential for use in non-linear inverse problems in geophysics where linearization is desirable, or for sensitivity analysis of large numerical simulation codes, for example, wave propagation and geodynamic modelling. At present, however, AD tools appear to be little used in the geosciences. Here we report on experiments using a state of the art AD tool to perform source to source code translation in a range of geoscience problems. These include calculating derivatives for Gibbs free energy minimization, seismic receiver function inversion, and seismic ray tracing. Issues of accuracy and efficiency are discussed.

Charles C. Margossian: A review of automatic differentiation and its efficient implementation 2019

Abstract Derivatives play a critical role in computational statistics, examples being Bayesian inference using Hamiltonian Monte Carlo sampling and the training of neural networks. Automatic differentiation (AD) is a powerful tool to automate the calculation of derivatives and is preferable to more traditional methods, especially when differentiating complex algorithms and mathematical functions. The implementation of AD, however, requires some care to insure efficiency. Modern differentiation packages deploy a broad range of computational techniques to improve applicability, run time, and memory management. Among these techniques are operation overloading, region-based memory, and expression templates. There also exist several mathematical techniques which can yield high performance gains when applied to complex algorithms. For example, semi-analytical derivatives can reduce by orders of magnitude the runtime required to numerically solve and differentiate an algebraic equation. Open and practical problems include the extension of current packages to provide more specialized routines, and finding optimal methods to perform higher-order differentiation.

Assefaw H. Gebremedhin, Andrea Walther: An introduction to algorithmic differentiation 2020

Abstract Algorithmic differentiation (AD), also known as automatic differentiation, is a technology for accurate and efficient evaluation of derivatives of a function given as a computer model. The evaluations of such models are essential building blocks in numerous scientific computing and data analysis applications, including optimization, parameter identification, sensitivity analysis, uncertainty quantification, nonlinear equation solving, and integration of differential equations. We provide an introduction to AD and present its basic ideas and techniques, some of its most important results, the implementation paradigms it relies on, the connection it has to other domains including machine learning and parallel computing, and a few of the major open problems in the area. Topics we discuss include: forward mode and reverse mode of AD, higher-order derivatives, operator overloading and source transformation, sparsity exploitation, checkpointing, cross-country mode, and differentiating iterative processes.

Torsten F. Bosse, H. Martin Bücker: A piecewise smooth version of the Griewank function Taylor & Francis, 2024

Test

Abstract The Griewank test function for global unconstrained optimization has multiple local minima clustered around the global minimum at the origin. A new version of this test function is proposed that has a similar structure, but whose behavior at the local minima and maxima is non-smooth. This piecewise smooth version of the Griewank function represents an abs-factorable test case of objective functions for global non-smooth optimization as, for example, observed in the training of neural networks.

autodiff.org
Username:
Password:

Survey articles