www.Autodiff.org - Publication: \ensuremath\nablaQDARTS: Quantization as an Elastic Dimension to Differentiable NAS

\ensuremath\nablaQDARTS: Quantization as an Elastic Dimension to Differentiable NAS

- Article in a journal -

Author(s)
Payman Behnam , Uday Kamal , Sanjana Vijay Ganesh , Zhaoyi Li , Michael Andrew Jurado , Alind Khare , Igor Fedorov , Gaowen Liu , Alexey Tumanov

Published in
Transactions on Machine Learning Research

Year
2025

Abstract
Differentiable Neural Architecture Search methods efficiently find high-accuracy architectures using gradient-based optimization in a continuous domain, saving computational resources. Mixed-precision search helps optimize precision within a fixed architecture. However, applying it to a NAS-generated network does not assure optimal performance as the optimized quantized architecture may not emerge from a standalone NAS method. In light of these considerations, this paper introduces \ensuremath\nablaQDARTS, a novel approach that combines differentiable NAS with mixed-precision search for both weight and activation. \ensuremath\nablaQDARTS aims to identify the optimal mixed-precision neural architecture capable of achieving remarkable accuracy while operating with minimal computational requirements in a single-shot, end-to-end differentiable framework, obviating the need for pretraining and proxy methods. Compared to fp32, \ensuremath\nablaQDARTS shows impressive performance on CIFAR10 with (2,4) bit precision, reducing bit operations by 160× with a slight 1.57% accuracy drop. Increasing the capacity enables \ensuremath\nablaQDARTS to match fp32 accuracy while reducing bit operations by 18×. For the ImageNet dataset, with just (2,4) bit precision, \ensuremath\nablaQDARTS outperforms state-of-the-art methods such as APQ, SPOS, OQA, and MNAS by 2.3%, 2.9%, 0.3%, and 2.7% in terms of accuracy. By incorporating (2,4,8) bit precision, \ensuremath\nablaQDARTS further minimizes the accuracy drop to 1% compared to fp32, alongside a substantial reduction of 17× in required bit operations and 2.6× in memory footprint. In terms of bit-operation (memory footprint) \ensuremath\nablaQDARTS excels over APQ, SPOS, OQA, and MNAS with similar accuracy by 2.3× (12×), 2.4× (3×), 13% (6.2×), 3.4× (37%), for bit-operation (memory footprint), respectively. \ensuremath\nablaQDARTS enhances the overall search and training efficiency, achieving a 3.1× and 1.54× improvement over APQ and OQA, respectively.

AD Theory and Techniques
Mixed-precision

BibTeX
@ARTICLE{
         Behnam2025Qaa,
       title = "\ensuremath{\nabla}{QDARTS}: Quantization as an Elastic Dimension to
         Differentiable {NAS}",
       author = "Payman Behnam and Uday Kamal and Sanjana Vijay Ganesh and Zhaoyi Li and Michael
         Andrew Jurado and Alind Khare and Igor Fedorov and Gaowen Liu and Alexey Tumanov",
       journal = "Transactions on Machine Learning Research",
       issn = "2835-8856",
       year = "2025",
       url = "https://openreview.net/forum?id=ubrOSWyTS8",
       abstract = "Differentiable Neural Architecture Search methods efficiently find high-accuracy
         architectures using gradient-based optimization in a continuous domain, saving computational
         resources. Mixed-precision search helps optimize precision within a fixed architecture. However,
         applying it to a NAS-generated network does not assure optimal performance as the optimized
         quantized architecture may not emerge from a standalone NAS method. In light of these
         considerations, this paper introduces \ensuremath{\nabla}QDARTS, a novel approach that
         combines differentiable NAS with mixed-precision search for both weight and activation.
         \ensuremath{\nabla}QDARTS aims to identify the optimal mixed-precision neural architecture
         capable of achieving remarkable accuracy while operating with minimal computational requirements in
         a single-shot, end-to-end differentiable framework, obviating the need for pretraining and proxy
         methods. Compared to fp32, \ensuremath{\nabla}QDARTS shows impressive performance on
         CIFAR10 with (2,4) bit precision, reducing bit operations by $160\times$ with a slight
         1.57\% accuracy drop. Increasing the capacity enables \ensuremath{\nabla}QDARTS to
         match fp32 accuracy while reducing bit operations by $18\times$. For the ImageNet dataset, with
         just (2,4) bit precision, \ensuremath{\nabla}QDARTS outperforms state-of-the-art methods
         such as APQ, SPOS, OQA, and MNAS by 2.3\%, 2.9\%, 0.3\%, and 2.7\% in terms of
         accuracy. By incorporating (2,4,8) bit precision, \ensuremath{\nabla}QDARTS further
         minimizes the accuracy drop to 1\% compared to fp32, alongside a substantial reduction of
         $17\times$ in required bit operations and $2.6\times$ in memory footprint. In terms of
         bit-operation (memory footprint) \ensuremath{\nabla}QDARTS excels over APQ, SPOS, OQA, and
         MNAS with similar accuracy by $2.3\times$ ($12\times$), $2.4\times$ ($3\times$),
         13\% ($6.2\times$), $3.4\times$ (37\%), for bit-operation (memory footprint),
         respectively. \ensuremath{\nabla}QDARTS enhances the overall search and training
         efficiency, achieving a $3.1\times$ and $1.54\times$ improvement over APQ and OQA,
         respectively.",
       ad_theotech = "Mixed-precision"
}

back

autodiff.org
Username:
Password: