|
Programme of the 25th Euro AD Workshop
Tuesday, June 13, 2023
- 900 –1030 Reception
- 1030 –1200 Session 1
- Alex Zinenko (Google)
Enzyme MLIR
This presentation will discuss the continuing effort to scale up the Enzyme automatic differentiation (AD) tool from operating on the LLVM internal representation to the broader MLIR representation, which is also a part of the LLVM project. MLIR representation offers unprecedented extensibility by supporting user-defined instructions and types in the compiler, which is a challenge for a compiler-based AD tool. It requires one to conceptualize a differentiable compiler instruction and capture all information required for AD in abstract terms. While Enzyme demonstrated that AD on a lower-level representation is not only feasible but also often profitable, EnzymeMLIR pushes beyond that by providing AD at multiple levels of abstraction, simultaneously, with the goal of identifying the most suitable level on the spectrum ranging from machine learning-style tensor operations to loops, to assembly-like instructions.
- Sri Hari Krishna Narayanan (Argonne National Laboratory)
Checkpoint Code Generation in Julia for Numerical Simulations
Automatic differentiation in machine learning is largely restricted to expressions used for neural networks (NN), with the depth rarely exceeding a few tens of layers. Compared to NN, numerical simulations typically involve iterative algorithms like time steppers that lead to millions of iterations. Even for modest-sized models, this may yield infeasible memory requirements when applying the adjoint method, also called backpropagation, to time-dependent problems. In this situation, checkpointing algorithms provide a trade-off between recomputa- tion and storage. This paper presents the package Checkpointing.jl that leverages expression transformations in the programming language Julia and the package ChainRules.jl to automatically and transparently transform loop iterations into differentiated loops. The user may choose between various checkpointing algorithm schemes and storage devices. We describe the unique design of Checkpointing.jl and demonstrate its features on an automatically differentiated MPI implementation of Burgers’ equation on the Polaris cluster at the Argonne Leadership Computing Facility.
- Simon Lukas Maertens (RWTH Aachen University (STCE), Germany)
ADMission Scheduling
Running plain forward or reverse mode on a given problem might not yield the fastest way to accumulate the Jacobian for a piece of code. Given the DAG we can use elimination techniques, like vertex, edge or face elimination to optimize the calculation of the Jacobian matrix. We demonstrate our vision of a pipeline which incorporates the extraction of a 'high-level' DAG, its annotation with runtime requirements, the optimization of the elimination sequence, and its execution.
- Alexandre Vieira (Universite Cote d'azur, CNRS, France)
Use of automatic differentiation for parameter identification in reduced MHD
We tackle the problem of physical parameter identification in a reduced magnetohydrodynamic problem based on observations. We will motivate this approach based on practical applications for astrophysics or the ITER project.
This problem is posed as an optimization problem that is solved using automatic differentiation for the computation of the gradient. We here report our experience in differentiating our code using Tapenade and the efficiency of the approach. The conclusion will expose some challenges for future applications in plasma physics.
- 1200 –1400 Lunch break
- 1400 –1500 Session 2
- Laurent Hascoet (INRIA Sophia-Antipolis, France)
Data-flow reversal and Garbage Collection
Data-flow reversal, the process of restoring memory states of a computation in reverse order, is at the heart of Source-Transformation reverse AD. It is well-known that the use of dynamic memory, involving dynamic allocation, deallocation, and pointers, pose delicate problems to data-flow reversal. Several strategies have been devised in AD tools to cope with dynamic memory, often satisfying but sometimes partial. We here explore the case of languages with Garbage Collection (GC), which have received little attention so far. Given the specifics of dynamic memory with GC, we propose a strategy to organize data-flow reversal that relies of pseudo-addresses and finalization actions. We experiment this strategy on a Java 2-D Navier-Stokes solver. We compare its performances with adjoint code of the same solver rewritten in Fortran or C, therefore without GC, and differentiated through more classical AD approaches.
- Johannes Blühdorn (Chair for Scientific Computing, University of Kaiserslautern-Landau (RPTU))
Hybrid Parallel AD of SU2
The open source CFD code SU2 features discrete adjoints, for which the operator overloading AD tool CoDiPack provides derivatives and MeDiPack differentiates MPI. Recently, SU2 transitioned from pure MPI parallelism to OpenMP-MPI hybrid parallelism. We apply OpDiLib to differentiate this new OpenMP layer of parallelism and thus enable hybrid parallel AD of SU2. In this talk, we discuss the transition from serial to OpenMP parallel codes with respect to AD compatibility, explain performance optimizations, and showcase performance results.
- Uwe Naumann (RWTH Aachen)
A Note on Cheaper Newton Steps
A modification of Newton's method for solving systems of $n$ nonlinear equations is presented based on a factorization of the Jacobian of the residual into regular sparse local Jacobians according to the chain rule of differentiation. The new matrix-free Newton method is explained in the context of banded local
Jacobians with bandwidth $2m+1$ for $m\ll n$. A reduction
of the computational cost by $\mathcal{O}(n/m)$ can be observed. Paths towards further generalization of the method are discussed briefly.
- 1500 –1530 Break
- 1530 –1730 Session 3
- Shreyas Suni Gaikwad (Oden Institute, University of Texas at Austin)
MITgcm-AD: Tangent Linear and Adjoint Modeling Frameworks for Oceans and Atmosphere Modeling Enabled by Automatic Differentiation Tool Tapenade
We present a new inverse modeling framework for the open-source MIT general circulation model (MITgcm) for the oceans and atmosphere, that is enabled by source transformation using the open-source Automatic Differentiation (AD) tool Tapenade.
Oceans are dynamic entities whose evolution is governed by nonlinear partial differential equations (PDEs) that conserve mass, momentum, and energy. Their evolution is determined by their initial state and uncertain forcings such as sea surface temperature, wind stresses, ocean bottom pressure, etc. These uncertainties propagate to Quantities of Interest(QoI), such as the strength of the Meridional Overturning Circulation (MOC). It is thus desirable to evaluate the sensitivities of our QoI to these independent input variables.
The derivative operators generated using Tapenade are powerful computational engines to efficiently compute gradients or sensitivities of scalar-valued model output, including least-squares model-data misfits or important QoI, to high-dimensional model inputs such as initial conditions, parameter fields, or boundary conditions.
The gradient in conjunction with recently collected data can be used to calibrate these parameters, which is an exercise in PDE-constrained gradient-based optimization. The entire framework is open source and freely available.
- William Moses (MIT)
Recent Compiler-Based AD Results and Open Questions
This talk (and open discussion) will discuss a variety of related research results and open problems surrounding compiler-based automatic differentiation, as exemplified within the Enzyme framework. In particular, this talk will illustrate the importance of compiler-based information and optimization when differentiating parallel programs, linear algebra, sparse derivatives (e.g. hessians). This talk will conclude by discussing an open problem involving activity analysis, that can provide difficulties for automatic differentiation.
- Andrew Lyons ()
Provably Optimal Derivative Accumulation via Reduction Rules
The AD community has known since the 1980s that optimal application of the chain rule---a process we call accumulation---can depend on the structure of the computational graph to which it is applied. Since that time, sophisticated heuristics have been developed, the most fine-grained of which operate within a framework called face elimination. The efficacy of these heuristics, however, is difficult to assess because we don't know how to find optimal face elimination sequences (or even whether it is NP-hard to do so!), and, moreover, we don't even know for sure that there is always a face elimination sequence that minimizes the number of operations performed in the accumulation process (though a long-standing conjecture asserts that there is).
In this talk we outline a new approach for reasoning about the space of all possible accumulation procedures directly and use it to obtain the first definitive statements regarding the minimum cost of derivative accumulation (as well as algorithms for realizing it) for a wide range of computational graphs, including most of the key examples in the literature. The techniques we develop are also applicable to general graphs, and should always be applied to yield simplified instances to hand off to heuristics.
- Kamil Khan (Mc Master University, Toronto)
Obtaining and implementing continuous adjoints for convex relaxations of parametric ODEs
(Joint work with Yingkai Song and Yulan Zhang)
In typical methods for continuous global optimization, minimizing a convex relaxation provides a lower bound of the objective function, and so the globally optimal value may be approached by generating many such lower bounds. While useful convex relaxation methods exist for parametric ODEs, minimizing these has been challenging because these relaxations' gradients and subgradients were not previously described by ODE sensitivity theory. We show that, in fact, ODE relaxation subgradients solve a simple auxiliary ODE system whose right-hand side may be generated by adapting either the vector forward subgradient AD mode of Mitsos et al. (2009) or the reverse subgradient AD mode of Beckers et al. (2012). The auxiliary ODE itself may then be solved as either a forward sensitivity system or a continuous adjoint system. We present proof-of-concept implementation results that employ the ODE solver CVODES, the McCormick relaxation tool MC++, and our own code generation tools in Julia.
- Jorg Lotze (Xcelerit)
Retrofit AD to Large Codebases: QuantLib Example
Automatic differentiation has a strong use case in quantitative finance where practitioners often need hundreds of sensitivities. However, typical quantitative libraries for derivatives pricing and risk management are hundreds of thousands of lines of legacy C++ code, posing huge challenges for retrofitting AD tools. Practitioners often default to using finite difference for computing sensitivities. The computational benefits of using automatic differentiation are phenomenal, and in many cases, encouraged by regulators. QuantLib is a large open-source C++ library of more than 500k lines that is widely used in Quantitative Finance for derivatives pricing and risk management. This talk presents the typical challenges faced with this use case using XAD, a comprehensive open-source tool for automatic differentiation. It also demonstrates the performance achievable for real world applications in quantitative finance.
- 1730 Break
|
Wednesday, June 14, 2023
- 930 –1100 Session 4
- Jean-Baptiste Caillau (Universite Cote d'azur, CNRS, France)
AD at the heart of numerical methods in optimal control
There are many ways to solve numerically optimal control problems but all of them build upon a common key tool: automatic differentiation. In the framework of the ct: control-toolbox initiative, we fist review the use of AD for direct methods (also known as direct transcription), that approximate the original control problem by an adequate nonlinear optimisation problem. We also discuss the importance and several avatars of AD in the case of indirect methods. Applying Pontrjagin maximum principle to an ODE optimal control problem leads to a boundary value problem; computing derivatives is then crucial for building and solving the problem. We report on the use of AD tools for several languages over the years, including Tapenade in fortran, CppAD in C++, and more recently ForwardDiff / Zygote in Julia.
Joint work with O. Cots, J. Gergaud, P. Martinon and the Inria UCA SED team.
- Ludger Paehler (TU Munich, Germany)
Numba-Enzyme: Differentiable JIT’d Python
In this talk we present Numba-Enzyme, a gradient-providing JIT-compiler for Python programs combining Numba, a dynamic Python compiler using LLVM, with Enzyme, gradient-synthesization at the LLVM-level. Due to Numba’s fast runtime performance it has seen wide adoption in compute-intensive scientific computing applications, Numba-Enzyme hence provides access to gradient-based methods without the need for rewrites while supporting pattern commonly found in scientific computing applications such as branches, loops, and array mutation. We demonstrate the effectiveness of our approach across a number of representative micro-benchmarks, and Numba-kernels taken from scientific applications.
- Max Aehle (University of Kaiserslautern-Landau)
AD of Compiled Programs with Derivgrind
Derivgrind is a novel AD tool that inserts AD logic into the machine
code of a compiled primal program, by means of the Valgrind dynamic binary instrumentation framework. Source code of the primal program is only partially required in order to define the AD inputs and outputs. Therefore, machine-code-based AD is applicable to cross-language and partially closed-source software projects. This talk is an introduction into how Derivgrind works.
- Dominic Jones (gmx, UK)
Applying AD in large scale software
Adjoint solvers and automatic differentiation engines have been under development in Siemens Star-CCM+ for over ten years. An assessment of this work will be presented, highlighting the basic methods employed, key programming language features leveraged in C++, and the longer term prospects of the project.
- 1100 –1130 Break
- 1130 –1230 Session 5
- Jan Hueckelheim (Argonne National Laboratory)
Understanding and avoiding automatic differentiation pitfalls
AD sometimes computes derivatives that could be interpreted as incorrect. These pitfalls occur systematically across tools and approaches. We present a condensed version of our recent review paper, which broadly categorizes problematic usages of AD and illustrates each category with known examples such as chaos, time-averaged oscillations, discretizations, fixed point loops, lookup tables, and linear solvers. We also review autodiff debugging techniques and their effectiveness in these situations.
- Niels Horsten (KU Leuven)
AD for Monte Carlo particle simulations of the plasma edge in nuclear fusion reactors
Plasma edge simulations of nuclear fusion reactors typically consist of a fluid finite-volume model for the plasma species coupled to a kinetic Monte Carlo particle tracing model for the neutrals (atoms and molecules). The statistical noise from the Monte Carlo part makes gradient calculation extremely challenging. In this talk, we demonstrate the success of AD in limiting the statistical noise compared to finite differences and we present some additional possible measures to further reduce the statistical error
- Sebastian Christodoulou (RWTH Aachen University (STCE), Germany)
Differentiable Programming: Efficient Smoothing of Control-Flow-Induced Discontinuities
We want to obtain derivatives in discontinuous program code, where default Algorithmic Differentiation may not perform well. We consider discontinuities induced by control flow statements, where meaningful derivatives should ideally be capable of representing the resulting 'jumps' in the trajectory. To achieve this, one can interpolate the trajectory at the control flow statements before taking the derivative. We formulate a method to efficiently interpolate between all boundaries induced by control flow in program code. This allows us to conceive a language that smoothly interpolates control-flow statements automatically and efficiently, making it fully differentiable.
- 1230 –1430 Lunch break
- 1430 Closing
|
|