|
|
On the numerical reliability of nonsmooth autodiff: a MaxPool case study-
Article in a journal
- | |
|
Author(s)
Ryan Boustany
|
Published in
Transactions on Machine Learning Research |
Year 2024 |
Abstract This paper considers the reliability of automatic differentiation for neural networks involving the nonsmooth MaxPool operation across various precision levels (16, 32, 64 bits), architectures (LeNet, VGG, ResNet), and datasets (MNIST, CIFAR10, SVHN, ImageNet). Although ad can be incorrect, recent research has shown that it coincides with the derivative almost everywhere, even in the presence of nonsmooth operations. On the other hand, in practice, ad operates with floating-point numbers, and there is, therefore, a need to explore subsets on which ad can be numerically incorrect. Recently, Bertoin et al. (2021) empirically studied how the choice of ReLU^\prime(0) changes the output of ad and define a numerical bifurcation zone where using ReLU^\prime(0)=0 differs from using ReLU^\prime(0)=1. To extend this for a broader class of nonsmooth operations, we propose a new numerical bifurcation zone (where ad is incorrect over real numbers) and define a compensation zone (where ad is incorrect over floating-point numbers but correct over reals). Using SGD for training, we found that nonsmooth MaxPool Jacobians with lower norms maintain stable and efficient test accuracy, while higher norms can result in instability and decreased performance. We can use batch normalization, Adam-like optimizers, or increase precision to reduce MaxPool Jacobians influence. |
AD Theory and Techniques Nonsmooth |
BibTeX
@ARTICLE{
Boustany2024Otn,
title = "On the numerical reliability of nonsmooth autodiff: a {MaxPool} case study",
author = "Ryan Boustany",
journal = "Transactions on Machine Learning Research",
issn = "2835-8856",
year = "2024",
url = "https://openreview.net/forum?id=142xsInVfp",
abstract = "This paper considers the reliability of automatic differentiation for neural
networks involving the nonsmooth MaxPool operation across various precision levels (16, 32, 64
bits), architectures (LeNet, VGG, ResNet), and datasets (MNIST, CIFAR10, SVHN, ImageNet). Although
AD can be incorrect, recent research has shown that it coincides with the derivative almost
everywhere, even in the presence of nonsmooth operations. On the other hand, in practice, AD
operates with floating-point numbers, and there is, therefore, a need to explore subsets on which AD
can be \emph{numerically} incorrect. Recently, Bertoin et al.~(2021) empirically studied how
the choice of $\mbox{ReLU}^{\prime}(0)$ changes the output of AD and define a numerical
bifurcation zone where using $\mbox{ReLU}^{\prime}(0)=0$ differs from using
$\mbox{ReLU}^{\prime}(0)=1$. To extend this for a broader class of nonsmooth operations,
we propose a new numerical bifurcation zone (where AD is incorrect over real numbers) and define a
compensation zone (where AD is incorrect over floating-point numbers but correct over reals). Using
SGD for training, we found that nonsmooth MaxPool Jacobians with lower norms maintain stable and
efficient test accuracy, while higher norms can result in instability and decreased performance. We
can use batch normalization, Adam-like optimizers, or increase precision to reduce MaxPool Jacobians
influence.",
ad_theotech = "Nonsmooth"
}
| |
back
|
|