|
|
\ensuremath\nablaQDARTS: Quantization as an Elastic Dimension to Differentiable NAS-
Article in a journal
- | |
|
Author(s)
Payman Behnam
, Uday Kamal
, Sanjana Vijay Ganesh
, Zhaoyi Li
, Michael Andrew Jurado
, Alind Khare
, Igor Fedorov
, Gaowen Liu
, Alexey Tumanov
|
Published in
Transactions on Machine Learning Research |
Year 2025 |
Abstract Differentiable Neural Architecture Search methods efficiently find high-accuracy architectures using gradient-based optimization in a continuous domain, saving computational resources. Mixed-precision search helps optimize precision within a fixed architecture. However, applying it to a NAS-generated network does not assure optimal performance as the optimized quantized architecture may not emerge from a standalone NAS method. In light of these considerations, this paper introduces \ensuremath\nablaQDARTS, a novel approach that combines differentiable NAS with mixed-precision search for both weight and activation. \ensuremath\nablaQDARTS aims to identify the optimal mixed-precision neural architecture capable of achieving remarkable accuracy while operating with minimal computational requirements in a single-shot, end-to-end differentiable framework, obviating the need for pretraining and proxy methods. Compared to fp32, \ensuremath\nablaQDARTS shows impressive performance on CIFAR10 with (2,4) bit precision, reducing bit operations by 160× with a slight 1.57% accuracy drop. Increasing the capacity enables \ensuremath\nablaQDARTS to match fp32 accuracy while reducing bit operations by 18×. For the ImageNet dataset, with just (2,4) bit precision, \ensuremath\nablaQDARTS outperforms state-of-the-art methods such as APQ, SPOS, OQA, and MNAS by 2.3%, 2.9%, 0.3%, and 2.7% in terms of accuracy. By incorporating (2,4,8) bit precision, \ensuremath\nablaQDARTS further minimizes the accuracy drop to 1% compared to fp32, alongside a substantial reduction of 17× in required bit operations and 2.6× in memory footprint. In terms of bit-operation (memory footprint) \ensuremath\nablaQDARTS excels over APQ, SPOS, OQA, and MNAS with similar accuracy by 2.3× (12×), 2.4× (3×), 13% (6.2×), 3.4× (37%), for bit-operation (memory footprint), respectively. \ensuremath\nablaQDARTS enhances the overall search and training efficiency, achieving a 3.1× and 1.54× improvement over APQ and OQA, respectively. |
AD Theory and Techniques Mixed-precision |
BibTeX
@ARTICLE{
Behnam2025Qaa,
title = "\ensuremath{\nabla}{QDARTS}: Quantization as an Elastic Dimension to
Differentiable {NAS}",
author = "Payman Behnam and Uday Kamal and Sanjana Vijay Ganesh and Zhaoyi Li and Michael
Andrew Jurado and Alind Khare and Igor Fedorov and Gaowen Liu and Alexey Tumanov",
journal = "Transactions on Machine Learning Research",
issn = "2835-8856",
year = "2025",
url = "https://openreview.net/forum?id=ubrOSWyTS8",
abstract = "Differentiable Neural Architecture Search methods efficiently find high-accuracy
architectures using gradient-based optimization in a continuous domain, saving computational
resources. Mixed-precision search helps optimize precision within a fixed architecture. However,
applying it to a NAS-generated network does not assure optimal performance as the optimized
quantized architecture may not emerge from a standalone NAS method. In light of these
considerations, this paper introduces \ensuremath{\nabla}QDARTS, a novel approach that
combines differentiable NAS with mixed-precision search for both weight and activation.
\ensuremath{\nabla}QDARTS aims to identify the optimal mixed-precision neural architecture
capable of achieving remarkable accuracy while operating with minimal computational requirements in
a single-shot, end-to-end differentiable framework, obviating the need for pretraining and proxy
methods. Compared to fp32, \ensuremath{\nabla}QDARTS shows impressive performance on
CIFAR10 with (2,4) bit precision, reducing bit operations by $160\times$ with a slight
1.57\% accuracy drop. Increasing the capacity enables \ensuremath{\nabla}QDARTS to
match fp32 accuracy while reducing bit operations by $18\times$. For the ImageNet dataset, with
just (2,4) bit precision, \ensuremath{\nabla}QDARTS outperforms state-of-the-art methods
such as APQ, SPOS, OQA, and MNAS by 2.3\%, 2.9\%, 0.3\%, and 2.7\% in terms of
accuracy. By incorporating (2,4,8) bit precision, \ensuremath{\nabla}QDARTS further
minimizes the accuracy drop to 1\% compared to fp32, alongside a substantial reduction of
$17\times$ in required bit operations and $2.6\times$ in memory footprint. In terms of
bit-operation (memory footprint) \ensuremath{\nabla}QDARTS excels over APQ, SPOS, OQA, and
MNAS with similar accuracy by $2.3\times$ ($12\times$), $2.4\times$ ($3\times$),
13\% ($6.2\times$), $3.4\times$ (37\%), for bit-operation (memory footprint),
respectively. \ensuremath{\nabla}QDARTS enhances the overall search and training
efficiency, achieving a $3.1\times$ and $1.54\times$ improvement over APQ and OQA,
respectively.",
ad_theotech = "Mixed-precision"
}
| |
back
|
|