www.Autodiff.org - Publication: On the numerical reliability of nonsmooth autodiff: a MaxPool case study

On the numerical reliability of nonsmooth autodiff: a MaxPool case study

- Article in a journal -

Published in
Transactions on Machine Learning Research

Year
2024

Abstract
This paper considers the reliability of automatic differentiation for neural networks involving the nonsmooth MaxPool operation across various precision levels (16, 32, 64 bits), architectures (LeNet, VGG, ResNet), and datasets (MNIST, CIFAR10, SVHN, ImageNet). Although ad can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations. On the other hand, in practice, ad operates with floating-point numbers, and there is, therefore, a need to explore subsets on which ad can be numerically incorrect. Recently, Bertoin et al. (2021) empirically studied how the choice of ReLU^\prime(0) changes the output of ad and define a numerical bifurcation zone where using ReLU^\prime(0)=0 differs from using ReLU^\prime(0)=1. To extend this for a broader class of nonsmooth operations, we propose a new numerical bifurcation zone (where ad is incorrect over real numbers) and define a compensation zone (where ad is incorrect over floating-point numbers but correct over reals). Using SGD for training, we found that nonsmooth MaxPool Jacobians with lower norms maintain stable and efficient test accuracy, while higher norms can result in instability and decreased performance. We can use batch normalization, Adam-like optimizers, or increase precision to reduce MaxPool Jacobians influence.

AD Theory and Techniques
Nonsmooth

BibTeX
@ARTICLE{
         Boustany2024Otn,
       title = "On the numerical reliability of nonsmooth autodiff: a {MaxPool} case study",
       author = "Ryan Boustany",
       journal = "Transactions on Machine Learning Research",
       issn = "2835-8856",
       year = "2024",
       url = "https://openreview.net/forum?id=142xsInVfp",
       abstract = "This paper considers the reliability of automatic differentiation for neural
         networks involving the nonsmooth MaxPool operation across various precision levels (16, 32, 64
         bits), architectures (LeNet, VGG, ResNet), and datasets (MNIST, CIFAR10, SVHN, ImageNet). Although
         AD can be incorrect, recent research has shown that it coincides with the derivative almost
         everywhere, even in the presence of nonsmooth operations. On the other hand, in practice, AD
         operates with floating-point numbers, and there is, therefore, a need to explore subsets on which AD
         can be \emph{numerically} incorrect. Recently, Bertoin et al.~(2021) empirically studied how
         the choice of $\mbox{ReLU}^{\prime}(0)$ changes the output of AD and define a numerical
         bifurcation zone where using $\mbox{ReLU}^{\prime}(0)=0$ differs from using
         $\mbox{ReLU}^{\prime}(0)=1$. To extend this for a broader class of nonsmooth operations,
         we propose a new numerical bifurcation zone (where AD is incorrect over real numbers) and define a
         compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using
         SGD for training, we found that nonsmooth MaxPool Jacobians with lower norms maintain stable and
         efficient test accuracy, while higher norms can result in instability and decreased performance. We
         can use batch normalization, Adam-like optimizers, or increase precision to reduce MaxPool Jacobians
         influence.",
       ad_theotech = "Nonsmooth"
}

back

autodiff.org
Username:
Password: