skip to main content
survey
Free Access
Just Accepted

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Online AM:11 May 2024Publication History
Skip Abstract Section

Abstract

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model’s accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.

References

  1. M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265–283.Google ScholarGoogle Scholar
  2. M. S. Abdelfattah, A. Mehrotra, Ł. Dudziak, and N. D. Lane. 2021. Zero-Cost Proxies for Lightweight NAS. (2021).Google ScholarGoogle Scholar
  3. AIM. 2022. Advances in Image Manipulation workshop in conjunction with ECCV 2022. Retrieved November 2, 2023 from https://data.vision.ee.ethz.ch/cvl/aim22/Google ScholarGoogle Scholar
  4. D. Amodei and D. Hernandez. 2018. AI and Compute. Retrieved November 2, 2023 from https://openai.com/blog/ai-and-computeGoogle ScholarGoogle Scholar
  5. S. An, Q. Liao, Z. Lu, and J.-H. Xue. 2022. Efficient semantic segmentation via self-attention and self-distillation. T-ITS 23, 9 (2022), 15256–15266.Google ScholarGoogle Scholar
  6. R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen, et al. 2023. PaLM 2 technical report. arXiv preprint arXiv:2305.10403(2023).Google ScholarGoogle Scholar
  7. A. Asperti, D. Evangelista, and M. Marzolla. 2021. Dissecting FLOPs along input dimensions for GreenAI cost estimations. In LOD. 86–100.Google ScholarGoogle Scholar
  8. C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V. Janapa Reddi, M. Mattina, and P. Whatmough. 2021. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. MLSys 3(2021).Google ScholarGoogle Scholar
  9. R. Banner, I. Hubara, E. Hoffer, and D. Soudry. 2018. Scalable methods for 8-bit training of neural networks. NIPS 31(2018).Google ScholarGoogle Scholar
  10. M. Bastian. 2023. GPT-4 has more than a trillion parameters - Report. Retrieved March 1, 2024 from https://the-decoder.com/gpt-4-has-a-trillion-parameters/Google ScholarGoogle Scholar
  11. A. Berthelier, T. Chateau, S. Duffner, C. Garcia, and C. Blanc. 2021. Deep model compression and architecture optimization for embedded systems: A survey. JSPS 93, 8 (2021), 863–878.Google ScholarGoogle Scholar
  12. M. Booshehri, A. Malekpour, and P. Luksch. 2013. An improving method for loop unrolling. IJCSIS 11, 5 (2013), 73–76.Google ScholarGoogle Scholar
  13. H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In ICLR.Google ScholarGoogle Scholar
  14. Y. Cai, Z. Yao, Z. Dong, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. In CVPR. 13169–13178.Google ScholarGoogle Scholar
  15. A. Capotondi, M. Rusci, M. Fariselli, and L. Benini. 2020. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. TCAS-II 67, 5 (2020), 871–875.Google ScholarGoogle ScholarCross RefCross Ref
  16. M. Capra, B. Bussolino, A. Marchisio, G. Masera, M. Martina, and M. Shafique. 2020. Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead. IEEE Access 8(2020), 225134–225180.Google ScholarGoogle ScholarCross RefCross Ref
  17. B. Chen, T. Medini, J. Farwell, C. Tai, A. Shrivastava, et al. 2020. SLIDE: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. MLSys 2(2020), 291–306.Google ScholarGoogle Scholar
  18. C.-Y. Chen, L. Lo, P.-J. Huang, H.-H. Shuai, and W.-H. Cheng. 2021. Fashionmirror: Co-attention feature-remapping virtual try-on with sequential template poses. In ICCV. 13809–13818.Google ScholarGoogle Scholar
  19. D. Chen, J.-P. Mei, H. Zhang, C. Wang, Y. Feng, and C. Chen. 2022. Knowledge distillation with the reused teacher classifier. In CVPR. 11933–11942.Google ScholarGoogle Scholar
  20. D. Chen, J.-P. Mei, Y. Zhang, C. Wang, Z. Wang, Y. Feng, and C. Chen. 2021. Cross-layer distillation with semantic calibration. In AAAI, Vol.  35. 7028–7036.Google ScholarGoogle Scholar
  21. H. Chen, Y. Wang, C. Xu, B. Shi, C. Xu, Q. Tian, and C. Xu. 2020. AdderNet: Do We Really Need Multiplications in Deep Learning?. In CVPR.Google ScholarGoogle Scholar
  22. P. Chen, S. Liu, H. Zhao, and J. Jia. 2021. Distilling knowledge via knowledge review. In CVPR. 5008–5017.Google ScholarGoogle Scholar
  23. T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News 42, 1 (2014), 269–284.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. 2016. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. NIPSW.Google ScholarGoogle Scholar
  25. W. Chen, Y. Wang, S. Yang, C. Liu, and L. Zhang. 2020. You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design. In DATE. 1283–1286.Google ScholarGoogle Scholar
  26. W. Chen, D. Xie, Y. Zhang, and S. Pu. 2019. All you need is a few shifts: Designing efficient convolutional neural networks for image classification. In CVPR. 7241–7250.Google ScholarGoogle Scholar
  27. Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu. 2022. Mobile-Former: Bridging MobileNet and Transformer. In CVPR. 5270–5279.Google ScholarGoogle Scholar
  28. Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al. 2014. DaDianNao: A machine-learning supercomputer. In MICRO. 609–622.Google ScholarGoogle Scholar
  29. Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, and J. Sun. 2019. Detnas: Neural architecture search on object detection. NIPS 1, 2 (2019), 4–1.Google ScholarGoogle Scholar
  30. S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759(2014).Google ScholarGoogle Scholar
  31. R. Child, S. Gray, A. Radford, and I. Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509(2019).Google ScholarGoogle Scholar
  32. J. Cho, Y. Jung, S. Lee, and Y. Jung. 2021. Reconfigurable binary neural network accelerator with adaptive parallelism scheme. Electronics 10, 3 (2021), 230.Google ScholarGoogle ScholarCross RefCross Ref
  33. J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085(2018).Google ScholarGoogle Scholar
  34. K. Choi, D. Hong, H. Yoon, J. Yu, Y. Kim, and J. Lee. 2021. Dance: Differentiable accelerator/network co-exploration. In DAC.Google ScholarGoogle Scholar
  35. F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR. 1251–1258.Google ScholarGoogle Scholar
  36. K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, et al. 2021. Rethinking attention with performers. In ICLR.Google ScholarGoogle Scholar
  37. X. Dai, A. Wan, P. Zhang, B. Wu, Z. He, Z. Wei, K. Chen, Y. Tian, M. Yu, P. Vajda, et al. 2021. Fbnetv3: Joint architecture-recipe search using predictor pretraining. In CVPR. 16276–16285.Google ScholarGoogle Scholar
  38. Z. Dai, H. Liu, Q. V. Le, and M. Tan. 2021. CoAtNet: Marrying convolution and attention for all data sizes. NIPS 34(2021), 3965–3977.Google ScholarGoogle Scholar
  39. R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes. 2021. TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. In MLSys, Vol.  3. 800–811.Google ScholarGoogle Scholar
  40. J. Deng, W. Li, Y. Chen, and L. Duan. 2021. Unbiased mean teacher for cross-domain object detection. In CVPR. 4091–4101.Google ScholarGoogle Scholar
  41. X. Dong, S. Chen, and S. Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. NIPS 30(2017).Google ScholarGoogle Scholar
  42. Z. Dong, Z. Yao, D. Arfeen, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. Hawq-v2: Hessian aware trace-weighted quantization of neural networks. NIPS 33(2020), 18518–18529.Google ScholarGoogle Scholar
  43. Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer. 2019. Hawq: Hessian aware quantization of neural networks with mixed-precision. In ICCV. 293–302.Google ScholarGoogle Scholar
  44. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.Google ScholarGoogle Scholar
  45. L. Du, Y. Du, Y. Li, J. Su, Y.-C. Kuan, C.-C. Liu, and M.-C. F. Chang. 2017. A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. TCAS-I 65, 1 (2017), 198–208.Google ScholarGoogle ScholarCross RefCross Ref
  46. S. Dubey, V. K. Soni, B. K. Dubey, et al. 2019. Application of Microcontroller in Assembly Line for Safety and Controlling. IJRAR 6, 1 (2019), 107–111.Google ScholarGoogle Scholar
  47. S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun. 2021. Convit: Improving vision transformers with soft convolutional inductive biases. In ICML. 2286–2296.Google ScholarGoogle Scholar
  48. M. Elhoushi, Z. Chen, F. Shafiq, Y. H. Tian, and J. Y. Li. 2021. Deepshift: Towards multiplication-less neural networks. In CVPR. 2359–2368.Google ScholarGoogle Scholar
  49. F. Faghri, I. Tabrizian, I. Markov, D. Alistarh, D. M. Roy, and A. Ramezani-Kebrya. 2020. Adaptive Gradient Quantization for Data-Parallel SGD. NIPS 33(2020), 3174–3185.Google ScholarGoogle Scholar
  50. Z. Fan, W. Hu, H. Guo, F. Liu, and D. Xu. 2021. Hardware and Algorithm Co-Optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In SMC. 3212–3217.Google ScholarGoogle Scholar
  51. M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2019. Auto-sklearn: efficient and robust automated machine learning. In Automated Machine Learning. 113–134.Google ScholarGoogle Scholar
  52. L. Foundation. 2017. ONNX. Retrieved November 2, 2023 from https://onnx.ai/Google ScholarGoogle Scholar
  53. M. Fraccaroli, E. Lamma, and F. Riguzzi. 2022. Symbolic DNN-tuner. Machine Learning (2022), 1–26.Google ScholarGoogle Scholar
  54. J. Frankle and M. Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR.Google ScholarGoogle Scholar
  55. E. Frantar and D. Alistarh. 2023. SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot. arXiv preprint arXiv:2301.00774(2023).Google ScholarGoogle Scholar
  56. Z. Fu, M. He, Z. Tang, and Y. Zhang. 2023. Optimizing data locality by executor allocation in spark computing environment. ComSIS 20, 1 (2023), 491–512.Google ScholarGoogle ScholarCross RefCross Ref
  57. J. Getzner, B. Charpentier, and S. Günnemann. 2023. Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models. In ICLR.Google ScholarGoogle Scholar
  58. A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer. 2022. A survey of quantization methods for efficient neural network inference. (2022), 291–326.Google ScholarGoogle Scholar
  59. A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, and K. Keutzer. 2018. SqueezeNext: Hardware-aware neural network design. In CVPRW. 1638–1647.Google ScholarGoogle Scholar
  60. A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. NIPS 30(2017).Google ScholarGoogle Scholar
  61. Google. 2023. Post-training quantization | TensorFlow Lite. Retrieved November 2, 2023 from https://www.tensorflow.org/lite/performance/post_training_quantizationGoogle ScholarGoogle Scholar
  62. J. Gou, B. Yu, S. J. Maybank, and D. Tao. 2021. Knowledge distillation: A survey. IJCV 129, 6 (2021), 1789–1819.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze. 2021. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference. In ICCV. 12259–12269.Google ScholarGoogle Scholar
  64. R. M. Gray and D. L. Neuhoff. 1998. Quantization. TIT 44, 6 (1998), 2325–2383.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. TCAD 37, 1 (2017), 35–47.Google ScholarGoogle Scholar
  66. Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo. 2020. Online knowledge distillation via collaborative learning. In CVPR. 11020–11029.Google ScholarGoogle Scholar
  67. Y. Guo, A. Yao, and Y. Chen. 2016. Dynamic network surgery for efficient DNNs. NIPS 29(2016).Google ScholarGoogle Scholar
  68. Z. Guo, R. Zhang, L. Qiu, X. Ma, X. Miao, X. He, and B. Cui. 2023. CALIP: Zero-shot enhancement of clip with parameter-free attention. In AAAI, Vol.  37. 746–754.Google ScholarGoogle Scholar
  69. M. Gupta and P. Agrawal. 2022. Compression of deep learning models for text: A survey. TKDD 16, 4 (2022), 1–55.Google ScholarGoogle ScholarDigital LibraryDigital Library
  70. S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. 2015. Deep learning with limited numerical precision. (2015), 1737–1746.Google ScholarGoogle Scholar
  71. S. Gupta and B. Akin. 2020. Accelerator-aware Neural Network Design using AutoML. MLSysW (2020).Google ScholarGoogle Scholar
  72. T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, et al. 2020. A2303 3: Accelerating attention mechanisms in neural networks with approximation. In HPCA. 328–341.Google ScholarGoogle Scholar
  73. T. J. Ham, Y. Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, and J. W. Lee. 2021. ELSA: Hardware-Software co-design for efficient, lightweight self-attention mechanism in neural networks. In ISCA. 692–705.Google ScholarGoogle Scholar
  74. K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao. 2023. A Survey on Vision Transformer. TPAMI 45, 1 (2023), 87–110.Google ScholarGoogle ScholarCross RefCross Ref
  75. S. Han, H. Mao, and W. J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR.Google ScholarGoogle Scholar
  76. B. Hassibi, D. G. Stork, and G. J. Wolff. 1993. Optimal brain surgeon and general network pruning. In ICNN. 293–299.Google ScholarGoogle Scholar
  77. K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.Google ScholarGoogle Scholar
  78. X. He, K. Zhao, and X. Chu. 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.Google ScholarGoogle ScholarCross RefCross Ref
  79. Y. He, Y. Ding, P. Liu, L. Zhu, H. Zhang, and Y. Yang. 2020. Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration. In CVPR. 2006–2015.Google ScholarGoogle Scholar
  80. Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. IJCAI (2018), 2234–2240.Google ScholarGoogle Scholar
  81. Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR. 4340–4349.Google ScholarGoogle Scholar
  82. Y. He, X. Liu, H. Zhong, and Y. Ma. 2019. AddressNet: Shift-based primitives for efficient convolutional neural networks. In WACV. 1213–1222.Google ScholarGoogle Scholar
  83. Y. He, X. Zhang, and J. Sun. 2017. Channel pruning for accelerating very deep neural networks. In CVPR. 1389–1397.Google ScholarGoogle Scholar
  84. S. C. Hidayati, T. W. Goh, J.-S. G. Chan, C.-C. Hsu, J. See, L.-K. Wong, K.-L. Hua, Y. Tsao, and W.-H. Cheng. 2020. Dress with style: Learning style from joint deep embedding of clothing styles and body shapes. TMM 23(2020), 365–377.Google ScholarGoogle Scholar
  85. G. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).Google ScholarGoogle Scholar
  86. J. Ho, A. Jain, and P. Abbeel. 2020. Denoising diffusion probabilistic models. NIPS 33(2020), 6840–6851.Google ScholarGoogle Scholar
  87. Y. Hou, Z. Ma, C. Liu, and C. C. Loy. 2019. Learning lightweight lane detection CNNs by self attention distillation. In ICCV. 1013–1021.Google ScholarGoogle Scholar
  88. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al. 2019. Searching for mobilenetv3. In ICCV. 1314–1324.Google ScholarGoogle Scholar
  89. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861(2017).Google ScholarGoogle Scholar
  90. L.-C. Hsu, C.-T. Chiu, K.-T. Lin, H.-H. Chou, and Y.-Y. Pu. 2020. ESSA: An energy-aware bit-serial streaming deep convolutional neural network accelerator. JSA 111(2020), 101831.Google ScholarGoogle Scholar
  91. J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-excitation networks. In CVPR. 7132–7141.Google ScholarGoogle Scholar
  92. W. Hu, Z. Che, N. Liu, M. Li, J. Tang, C. Zhang, and J. Wang. 2023. CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization. TNNLS (2023), 1–13.Google ScholarGoogle Scholar
  93. G. Huang, S. Liu, L. Van der Maaten, and K. Q. Weinberger. 2018. CondenseNet: An efficient DenseNet using learned group convolutions. In CVPR. 2752–2761.Google ScholarGoogle Scholar
  94. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700–4708.Google ScholarGoogle Scholar
  95. J.-C. Huang and T. Leng. 1999. Generalized loop-unrolling: a method for program speedup. In ASSET. 244–248.Google ScholarGoogle Scholar
  96. Z. Huang and N. Wang. 2019. Like what you like: Knowledge distill via neuron selectivity transfer. In ICLR.Google ScholarGoogle Scholar
  97. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2016. Binarized neural networks. In NIPS. 4114–4122.Google ScholarGoogle Scholar
  98. F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In ICLR.Google ScholarGoogle Scholar
  99. B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. (2018), 2704–2713.Google ScholarGoogle Scholar
  100. Y. Jeon and J. Kim. 2018. Constructing fast network through deconstruction of convolution. NIPS 31(2018).Google ScholarGoogle Scholar
  101. M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim. 2022. Visual prompt tuning. In ECCV. 709–727.Google ScholarGoogle Scholar
  102. N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, et al. 2023. TPU v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In ISCA. 1–14.Google ScholarGoogle Scholar
  103. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA. 1–12.Google ScholarGoogle Scholar
  104. S. Jung, C. Son, S. Lee, J. Son, J.-J. Han, Y. Kwak, S. J. Hwang, and C. Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In CVPR. 4350–4359.Google ScholarGoogle Scholar
  105. B. Kang, X. Chen, D. Wang, H. Peng, and H. Lu. 2023. Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking. In ICCV. 9612–9621.Google ScholarGoogle Scholar
  106. M. Kang and B. Han. 2020. Operation-aware soft channel pruning using differentiable masks. In ICML. 7021–7032.Google ScholarGoogle Scholar
  107. K. Kim, B. Ji, D. Yoon, and S. Hwang. 2021. Self-knowledge distillation with progressive refinement of targets. In ICCV. 6567–6576.Google ScholarGoogle Scholar
  108. N. Kitaev, Ł. Kaiser, and A. Levskaya. 2020. Reformer: The efficient transformer. In ICLR.Google ScholarGoogle Scholar
  109. L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2019. Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. In Automated Machine Learning. 81–95.Google ScholarGoogle Scholar
  110. A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. NIPS 25(2012), 1097–1105.Google ScholarGoogle Scholar
  111. S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang, et al. 2019. Scale MLPerf-0.6 models on google TPU-v3 pods. arXiv preprint arXiv:1909.09756(2019).Google ScholarGoogle Scholar
  112. L. Lai, N. Suda, and V. Chandra. 2018. CMSIS-NN: Efficient neural network kernels for arm cortex-m CPUs. arXiv preprint arXiv:1801.06601(2018).Google ScholarGoogle Scholar
  113. Y. LeCun, J. Denker, and S. Solla. 1989. Optimal brain damage. NIPS 2(1989).Google ScholarGoogle Scholar
  114. N. Lee, T. Ajanthan, and P. H. Torr. 2019. Snip: Single-shot network pruning based on connection sensitivity. ICLR.Google ScholarGoogle Scholar
  115. H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. 2017. Pruning Filters for Efficient ConvNets. In ICLR.Google ScholarGoogle Scholar
  116. N. Li, S. Takaki, Y. Tomiokay, and H. Kitazawa. 2016. A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition. In SSIAI. 165–168.Google ScholarGoogle Scholar
  117. S. Li, M. Lin, Y. Wang, Y. Wu, Y. Tian, L. Shao, and R. Ji. 2023. Distilling a Powerful Student Model via Online Knowledge Distillation. TNNLS 34, 11 (2023), 8743–8752.Google ScholarGoogle Scholar
  118. S. Li, M. Tan, R. Pang, A. Li, L. Cheng, Q. V. Le, and N. P. Jouppi. 2021. Searching for fast model families on datacenter accelerators. In CVPR. 8085–8095.Google ScholarGoogle Scholar
  119. Y. Li, C. Hao, X. Zhang, X. Liu, Y. Chen, J. Xiong, W.-m. Hwu, and D. Chen. 2020. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded ai solutions. In DAC. 1–6.Google ScholarGoogle Scholar
  120. Y. Li, Y. Hu, F. Wu, and K. Li. 2022. DiVIT: Algorithm and architecture co-design of differential attention in vision transformer. JSA (2022), 102520.Google ScholarGoogle Scholar
  121. T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461(2021), 370–403.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Y. Liang, G. Chongjian, Z. Tong, Y. Song, J. Wang, and P. Xie. 2021. EViT: Expediting Vision Transformers via Token Reorganizations. In ICLR.Google ScholarGoogle Scholar
  123. J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han. 2021. MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. In NIPS.Google ScholarGoogle Scholar
  124. J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han, et al. 2020. MCUNet: Tiny deep learning on iot devices. NIPS 33(2020), 11711–11722.Google ScholarGoogle Scholar
  125. S. Lin, H. Xie, B. Wang, K. Yu, X. Chang, X. Liang, and G. Wang. 2022. Knowledge distillation via the target-aware transformer. In CVPR. 10915–10924.Google ScholarGoogle Scholar
  126. S. Lin, H. Xie, B. Wang, K. Yu, X. Chang, X. Liang, and G. Wang. 2022. Knowledge Distillation via the Target-Aware Transformer. In CVPR. 10915–10924.Google ScholarGoogle Scholar
  127. Y. Lin, D. Hafdi, K. Wang, Z. Liu, and S. Han. 2020. Neural-hardware architecture search. NIPSWS (2020).Google ScholarGoogle Scholar
  128. Y.-J. Lin and T. S. Chang. 2017. Data and hardware efficient design for convolutional neural network. TCAS-I 65, 5 (2017), 1642–1651.Google ScholarGoogle Scholar
  129. B. Liu, F. Li, X. Wang, B. Zhang, and J. Yan. 2023. Ternary weight networks. In ICASSP. 1–5.Google ScholarGoogle Scholar
  130. H. Liu, K. Simonyan, and Y. Yang. 2019. DARTS: Differentiable Architecture Search. (2019).Google ScholarGoogle Scholar
  131. L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y. Chen, W. Yang, Q. Liao, and W. Zhang. 2021. Group fisher pruning for practical network compression. In ICML.Google ScholarGoogle Scholar
  132. X. Liu, M. Ye, D. Zhou, and Q. Liu. 2021. Post-training quantization with multiple points: Mixed precision without mixed precision. In AAAI, Vol.  35. 8697–8705.Google ScholarGoogle Scholar
  133. Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo. 2022. Swin Transformer V2: Scaling Up Capacity and Resolution. In CVPR. 12009–12019.Google ScholarGoogle Scholar
  134. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.Google ScholarGoogle Scholar
  135. Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K.-T. Cheng, and J. Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In ICCV. 3296–3305.Google ScholarGoogle Scholar
  136. G. Luo, Y. Zhou, X. Sun, Y. Wang, L. Cao, Y. Wu, F. Huang, and R. Ji. 2022. Towards lightweight transformer via group-wise transformation for vision-and-language tasks. TIP 31(2022), 3386–3398.Google ScholarGoogle ScholarDigital LibraryDigital Library
  137. T. Luo, S. Liu, L. Li, Y. Wang, S. Zhang, T. Chen, Z. Xu, O. Temam, and Y. Chen. 2016. DaDianNao: A neural network supercomputer. IEEE TC 66, 1 (2016), 73–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  138. N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In ECCV. 116–131.Google ScholarGoogle Scholar
  139. MAI. 2021. Mobile AI workshop 2021. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2021/#challengesGoogle ScholarGoogle Scholar
  140. MAI. 2022. Mobile AI workshop 2022. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2022/#challengesGoogle ScholarGoogle Scholar
  141. MAI. 2023. Mobile AI workshop 2023. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2023/#challengesGoogle ScholarGoogle Scholar
  142. S. Mehta, M. Ghazvininejad, S. Iyer, L. Zettlemoyer, and H. Hajishirzi. 2021. Delight: Very deep and light-weight transformer. In ICLR.Google ScholarGoogle Scholar
  143. S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2018. Pyramidal recurrent unit for language modeling. In EMNLP.Google ScholarGoogle Scholar
  144. S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2020. Define: Deep factorized input token embeddings for neural sequence modeling. In ICLR.Google ScholarGoogle Scholar
  145. S. Mehta and M. Rastegari. 2022. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. In ICLR.Google ScholarGoogle Scholar
  146. L. Mezdour, K. Kadem, M. Merouani, A. S. Haichour, S. Amarasinghe, and R. Baghdadi. 2023. A Deep Learning Model for Loop Interchange. In ACM SIGPLAN CC. 50–60.Google ScholarGoogle Scholar
  147. P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu. 2018. Mixed Precision Training. (2018).Google ScholarGoogle Scholar
  148. M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang. 2021. Intriguing properties of vision transformers. NIPS 34(2021).Google ScholarGoogle Scholar
  149. A. Nechi, L. Groth, S. Mulhem, F. Merchant, R. Buchty, and M. Berekovic. 2023. FPGA-based Deep Learning Inference Accelerators: Where Are We Standing?TRETS 16, 4 (2023), 1–32.Google ScholarGoogle Scholar
  150. NVIDIA. 2023. NVIDIA CUDA-X: GPU Accelerated Libraries. Retrieved November 2, 2023 from https://developer.nvidia.com/gpu-accelerated-librariesGoogle ScholarGoogle Scholar
  151. OpenAI. 2023. GPT-4 Technical Report. (2023).Google ScholarGoogle Scholar
  152. A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45, 2 (2017), 27–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  153. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. NIPS 32(2019).Google ScholarGoogle Scholar
  154. H. Peng, J. Wu, S. Chen, and J. Huang. 2019. Collaborative Channel Pruning for Deep Networks. In ICML. 5113–5122.Google ScholarGoogle Scholar
  155. H. Pouransari, Z. Tu, and O. Tuzel. 2020. Least squares binary quantization of neural networks. In CVPRW. 698–699.Google ScholarGoogle Scholar
  156. Z. Qi, W. Chen, R. A. Naqvi, and K. Siddique. 2022. Designing Deep Learning Hardware Accelerator and Efficiency Evaluation. Comput. Intell. and Neurosci. 2022 (2022).Google ScholarGoogle Scholar
  157. J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In ACM FPGA. 26–35.Google ScholarGoogle Scholar
  158. I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár. 2020. Designing network design spaces. In CVPR. 10428–10436.Google ScholarGoogle Scholar
  159. Y. Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh. 2021. Dynamicvit: Efficient vision transformers with dynamic token sparsification. NIPS 34(2021), 13937–13949.Google ScholarGoogle Scholar
  160. M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: Imagenet classification using binary convolutional neural networks. In ECCV. 525–542.Google ScholarGoogle Scholar
  161. P. P. Ray. 2022. A review on TinyML: State-of-the-art and prospects. Journal of King Saud University-Computer and Information Sciences 34, 4(2022), 1595–1623.Google ScholarGoogle ScholarDigital LibraryDigital Library
  162. E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-scale evolution of image classifiers. In ICML. 2902–2911.Google ScholarGoogle Scholar
  163. P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, and X. Wang. 2021. A comprehensive survey of neural architecture search: Challenges and solutions. CSUR 54, 4 (2021), 1–34.Google ScholarGoogle Scholar
  164. D. Roggen, R. Cobden, A. Pouryazdan, and M. Zeeshan. 2022. Wearable FPGA platform for accelerated dsp and ai applications. In PerComW. 66–69.Google ScholarGoogle Scholar
  165. B. Rokh, A. Azarpeyvand, and A. Khanteymoori. 2023. A comprehensive survey on model quantization for deep neural networks in image classification. TIST 14, 6 (2023), 1–50.Google ScholarGoogle ScholarDigital LibraryDigital Library
  166. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In CVPR. 10684–10695.Google ScholarGoogle Scholar
  167. C. Sakr, S. Dai, R. Venkatesan, B. Zimmer, W. Dally, and B. Khailany. 2022. Optimal clipping and magnitude-aware differentiation for improved quantization-aware training. In ICML. 19123–19138.Google ScholarGoogle Scholar
  168. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR. 4510–4520.Google ScholarGoogle Scholar
  169. R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni. 2020. Green ai. CACM 63, 12 (2020), 54–63.Google ScholarGoogle ScholarDigital LibraryDigital Library
  170. L. Sekanina. 2021. Neural architecture search and hardware accelerator co-search: A survey. IEEE access 9(2021), 151337–151362.Google ScholarGoogle ScholarCross RefCross Ref
  171. K. P. Seng, P. J. Lee, and L. M. Ang. 2021. Embedded intelligence on FPGA: Survey, applications and challenges. Electronics 10, 8 (2021), 895.Google ScholarGoogle ScholarCross RefCross Ref
  172. Y. Shang, Z. Yuan, B. Xie, B. Wu, and Y. Yan. 2023. Post-training quantization on diffusion models. In CVPR. 1972–1981.Google ScholarGoogle Scholar
  173. K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.Google ScholarGoogle Scholar
  174. S. Sinha. 2023. State of IoT 2023: Number of connected IoT devices growing 16% to 16.7 billion globally. Retrieved November 2, 2023 from https://iot-analytics.com/number-connected-iot-devices/Google ScholarGoogle Scholar
  175. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. 2021. Score-based generative modeling through stochastic differential equations. In ICLR.Google ScholarGoogle Scholar
  176. A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani. 2021. Bottleneck transformers for visual recognition. In CVPR. 16519–16529.Google ScholarGoogle Scholar
  177. A. Stoutchinin, F. Conti, and L. Benini. 2019. Optimally scheduling CNN convolutions for efficient memory access. arXiv preprint arXiv:1902.01492(2019).Google ScholarGoogle Scholar
  178. E. Strubell, A. Ganesh, and A. McCallum. 2019. Energy and policy considerations for deep learning in NLP. ACL.Google ScholarGoogle Scholar
  179. Z. Su, L. Fang, W. Kang, D. Hu, M. Pietikäinen, and L. Liu. 2020. Dynamic group convolution for accelerating convolutional neural networks. In ECCV. 138–155.Google ScholarGoogle Scholar
  180. M. Sultana, M. Naseer, M. H. Khan, S. Khan, and F. S. Khan. 2022. Self-Distilled Vision Transformer for Domain Generalization. In ACCV. 3068–3085.Google ScholarGoogle Scholar
  181. M. Sun, Z. Liu, A. Bair, and J. Z. Kolter. 2023. A Simple and Effective Pruning Approach for Large Language Models. arXiv preprint arXiv:2306.11695(2023).Google ScholarGoogle Scholar
  182. M. Sun, H. Ma, G. Kang, Y. Jiang, T. Chen, X. Ma, Z. Wang, and Y. Wang. 2022. VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer. arXiv preprint arXiv:2201.06618(2022).Google ScholarGoogle Scholar
  183. Y. Sun, H. Wang, B. Xue, Y. Jin, G. G. Yen, and M. Zhang. 2020. Surrogate-Assisted Evolutionary Deep Learning Using an End-to-End Random Forest-Based Performance Predictor. TEVC 24, 2 (2020), 350–364.Google ScholarGoogle ScholarCross RefCross Ref
  184. V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer. 2020. How to evaluate deep neural network processors: Tops/w (alone) considered harmful. SSC-M 12, 3 (2020), 28–41.Google ScholarGoogle Scholar
  185. C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.Google ScholarGoogle Scholar
  186. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In CVPR. 1–9.Google ScholarGoogle Scholar
  187. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818–2826.Google ScholarGoogle Scholar
  188. A. Talwalkar. 2020. The push for energy efficient ”Green AI”. Retrieved November 2, 2023 from https://spectrum.ieee.org/energy-efficient-green-ai-strategiesGoogle ScholarGoogle Scholar
  189. J. Tan, L. Niu, J. K. Adams, V. Boominathan, J. T. Robinson, R. G. Baraniuk, and A. Veeraraghavan. 2019. Face Detection and Verification Using Lensless Cameras. TCI 5, 2 (2019), 180–194.Google ScholarGoogle Scholar
  190. M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In CVPR. 2820–2828.Google ScholarGoogle Scholar
  191. M. Tan and Q. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML. 6105–6114.Google ScholarGoogle Scholar
  192. M. Tan and Q. Le. 2021. EfficientNetV2: Smaller models and faster training. In ICML. 10096–10106.Google ScholarGoogle Scholar
  193. M. Tan and Q. V. Le. 2019. MixConv: Mixed depthwise convolutional kernels. (2019).Google ScholarGoogle Scholar
  194. C. Tao, L. Hou, W. Zhang, L. Shang, X. Jiang, Q. Liu, P. Luo, and N. Wong. 2022. Compression of generative pre-trained language models via quantization. In ACL.Google ScholarGoogle Scholar
  195. A. Tarvainen and H. Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS, Vol.  30.Google ScholarGoogle Scholar
  196. Y. Tay, M. Dehghani, D. Bahri, and D. Metzler. 2021. Efficient transformers: A survey. CSUR 54, 4 (2021), 1–41.Google ScholarGoogle Scholar
  197. Y. Tian, D. Krishnan, and P. Isola. 2020. Contrastive Representation Distillation. (2020).Google ScholarGoogle Scholar
  198. H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347–10357.Google ScholarGoogle Scholar
  199. H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971(2023).Google ScholarGoogle Scholar
  200. H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288(2023).Google ScholarGoogle Scholar
  201. S. Um, S. Kim, S. Kim, and H.-J. Yoo. 2021. A 43.1 tops/w energy-efficient absolute-difference-accumulation operation computing-in-memory with computation reuse. TCAS-II 68, 5 (2021), 1605–1609.Google ScholarGoogle ScholarCross RefCross Ref
  202. H. Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference, Vol.  1.Google ScholarGoogle Scholar
  203. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. NIPS 30(2017).Google ScholarGoogle Scholar
  204. L. N. Viet, T. N. Dinh, D. T. Minh, H. N. Viet, and Q. L. Tran. 2021. UET-Headpose: A sensor-based top-view head pose dataset. In KSE. 1–7.Google ScholarGoogle Scholar
  205. A. Wan, X. Dai, P. Zhang, Z. He, Y. Tian, S. Xie, B. Wu, M. Yu, T. Xu, K. Chen, et al. 2020. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In CVPR. 12965–12974.Google ScholarGoogle Scholar
  206. H. Wang, Z. Zhang, and S. Han. 2021. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In HPCA. 97–110.Google ScholarGoogle Scholar
  207. L. Wang, X. Dong, Y. Wang, L. Liu, W. An, and Y. Guo. 2022. Learnable Lookup Table for Neural Network Quantization. In CVPR. 12423–12433.Google ScholarGoogle Scholar
  208. N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. In NIPS. 7686–7695.Google ScholarGoogle Scholar
  209. S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768(2020).Google ScholarGoogle Scholar
  210. X. Wang, M. Kan, S. Shan, and X. Chen. 2019. Fully learnable group convolution for acceleration of deep neural networks. In CVPR. 9049–9058.Google ScholarGoogle Scholar
  211. X. Wang, L. L. Zhang, Y. Wang, and M. Yang. 2022. Towards efficient vision transformer inference: a first study of transformers on mobile devices. In WMCSA. 1–7.Google ScholarGoogle Scholar
  212. Z. Wang, K. Xu, S. Wu, L. Liu, L. Liu, and D. Wang. 2020. Sparse-YOLO: Hardware/software co-design of an FPGA accelerator for YOLOv2. IEEE Access 8(2020), 116569–116585.Google ScholarGoogle ScholarCross RefCross Ref
  213. X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In DAC. 1–6.Google ScholarGoogle Scholar
  214. M. E. Wolf and M. S. Lam. 1991. A data locality optimizing algorithm. In PLDI. 30–44.Google ScholarGoogle Scholar
  215. M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, et al. 2022. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. (2022), 23965–23998.Google ScholarGoogle Scholar
  216. B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In CVPR. 10734–10742.Google ScholarGoogle Scholar
  217. B. Wu, A. Wan, X. Yue, P. Jin, S. Zhao, N. Golmant, A. Gholaminejad, J. Gonzalez, and K. Keutzer. 2018. Shift: A zero flop, zero parameter alternative to spatial convolutions. In CVPR. 9127–9135.Google ScholarGoogle Scholar
  218. H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang. 2021. Cvt: Introducing convolutions to vision transformers. In ICCV. 22–31.Google ScholarGoogle Scholar
  219. X. Wu, C. Li, R. Y. Aminabadi, Z. Yao, and Y. He. 2023. Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases. arXiv preprint arXiv:2301.12017(2023).Google ScholarGoogle Scholar
  220. Z. Wu, Z. Liu, J. Lin, Y. Lin, and S. Han. 2020. Lite transformer with long-short range attention. ICLR.Google ScholarGoogle Scholar
  221. T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick. 2021. Early convolutions help transformers see better. NIPS 34(2021).Google ScholarGoogle Scholar
  222. H. Xie, M.-X. Lee, T.-J. Chen, H.-J. Chen, H.-I. Liu, H.-H. Shuai, and W.-H. Cheng. 2023. Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition. In ICCV. 20598–20608.Google ScholarGoogle Scholar
  223. R. Xu, E. H.-M. Sha, Q. Zhuge, Y. Song, and H. Wang. 2023. Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs. JSA 135(2023), 102799.Google ScholarGoogle Scholar
  224. Y. Xue, C. Chen, and A. Słowik. 2023. Neural Architecture Search Based on A Multi-objective Evolutionary Algorithm with Probability Stack. TEVC 27, 4 (2023).Google ScholarGoogle Scholar
  225. C. Yang, L. Xie, C. Su, and A. L. Yuille. 2019. Snapshot distillation: Teacher-student optimization in one generation. In CVPR. 2859–2868.Google ScholarGoogle Scholar
  226. J. Yang, B. Martinez, A. Bulat, G. Tzimiropoulos, et al. 2021. Knowledge distillation via softmax regression representation learning. In ICLR.Google ScholarGoogle Scholar
  227. L. Yang, H. Jiang, R. Cai, Y. Wang, S. Song, G. Huang, and Q. Tian. 2021. Condensenet v2: Sparse feature reactivation for deep networks. In CVPR. 3569–3578.Google ScholarGoogle Scholar
  228. T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam. 2018. Netadapt: Platform-aware neural network adaptation for mobile applications. In ECCV. 285–300.Google ScholarGoogle Scholar
  229. T.-J. Yang, Y.-L. Liao, and V. Sze. 2021. Netadaptv2: Efficient neural architecture search with fast super-network training and architecture optimization. In CVPR. 2402–2411.Google ScholarGoogle Scholar
  230. Z. Yao, Z. Dong, Z. Zheng, A. Gholami, J. Yu, E. Tan, L. Wang, Q. Huang, Y. Wang, M. Mahoney, et al. 2021. Hawq-v3: Dyadic neural network quantization. In ICML. 11875–11886.Google ScholarGoogle Scholar
  231. J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, et al. 2023. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv preprint arXiv:2303.10420(2023).Google ScholarGoogle Scholar
  232. J. Ye, X. Lu, Z. Lin, and J. Z. Wang. 2018. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR.Google ScholarGoogle Scholar
  233. H. Yin, A. Vahdat, J. Alvarez, A. Mallya, J. Kautz, and P. Molchanov. 2022. AdaViT: Adaptive Tokens for Efficient Vision Transformer. (2022), 10809–10818.Google ScholarGoogle Scholar
  234. J. Yoon, D. Kang, and M. Cho. 2022. Semi-supervised Domain Adaptation via Sample-to-Sample Self-Distillation. In WACV. 1978–1987.Google ScholarGoogle Scholar
  235. H. You, X. Chen, Y. Zhang, C. Li, S. Li, Z. Liu, Z. Wang, and Y. Lin. 2020. ShiftAddNet: A Hardware-Inspired Deep Network. NIPS 33(2020), 2771–2783.Google ScholarGoogle Scholar
  236. C. Yu, T. Chen, and Z. Gan. 2023. Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization. In ACL. 218–235.Google ScholarGoogle Scholar
  237. J. Yu, J. Liu, X. Wei, H. Zhou, Y. Nakata, D. Gudovskiy, T. Okuno, J. Li, K. Keutzer, and S. Zhang. 2022. Cross-domain object detection with mean-teacher transformer. In ECCV.Google ScholarGoogle Scholar
  238. L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. Tay, J. Feng, and S. Yan. 2021. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In ICCV. 558–567.Google ScholarGoogle Scholar
  239. L. Yuan, F. E. Tay, G. Li, T. Wang, and J. Feng. 2020. Revisiting knowledge distillation via label smoothing regularization. In CVPR. 3903–3911.Google ScholarGoogle Scholar
  240. M. Yuan and Y. Lin. 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 1 (2006), 49–67.Google ScholarGoogle ScholarCross RefCross Ref
  241. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM FPGA. 161–170.Google ScholarGoogle Scholar
  242. C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong. 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. TCAD 38, 11 (2018), 2072–2085.Google ScholarGoogle Scholar
  243. H. Zhang, Z. Hu, W. Qin, M. Xu, and M. Wang. 2021. Adversarial co-distillation learning for image recognition. Pattern Recognition 111(2021), 107659.Google ScholarGoogle ScholarCross RefCross Ref
  244. H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum. 2023. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. In ICLR.Google ScholarGoogle Scholar
  245. L. Zhang, A. Rao, and M. Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV. 3836–3847.Google ScholarGoogle Scholar
  246. L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV. 3713–3722.Google ScholarGoogle Scholar
  247. S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In MICRO. 1–12.Google ScholarGoogle Scholar
  248. X. Zhang, X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR. 6848–6856.Google ScholarGoogle Scholar
  249. Y. Zhang and N. M. Freris. 2023. Adaptive Filter Pruning via Sensitivity Feedback. TNNLS (2023), 1–13.Google ScholarGoogle Scholar
  250. Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu. 2018. Deep mutual learning. In CVPR. 4320–4328.Google ScholarGoogle Scholar
  251. Z. Zhang, J. Li, W. Shao, Z. Peng, R. Zhang, X. Wang, and P. Luo. 2019. Differentiable learning-to-group channels via groupable convolutional neural networks. In ICCV. 3542–3551.Google ScholarGoogle Scholar
  252. B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang. 2022. Decoupled Knowledge Distillation. In CVPR. 11953–11962.Google ScholarGoogle Scholar
  253. D. Zhou, Q. Hou, Y. Chen, J. Feng, and S. Yan. 2020. Rethinking bottleneck structure for efficient mobile network design. In ECCV. 680–697.Google ScholarGoogle Scholar
  254. X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen. 2018. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In MICRO. 15–28.Google ScholarGoogle Scholar
  255. Y. Zhou, X. Dong, B. Akin, M. Tan, D. Peng, T. Meng, A. Yazdanbakhsh, D. Huang, R. Narayanaswami, and J. Laudon. 2021. Rethinking co-design of neural architectures and hardware accelerators. arXiv preprint arXiv:2102.08619(2021).Google ScholarGoogle Scholar
  256. C. Zhu, S. Han, H. Mao, and W. J. Dally. 2017. Trained Ternary Quantization. In ICLR.Google ScholarGoogle Scholar
  257. B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. ICLR (2017).Google ScholarGoogle Scholar

Index Terms

  1. Lightweight Deep Learning for Resource-Constrained Environments: A Survey

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Computing Surveys
            ACM Computing Surveys Just Accepted
            ISSN:0360-0300
            EISSN:1557-7341
            Table of Contents

            Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Online AM: 11 May 2024
            • Accepted: 2 April 2024
            • Revised: 2 March 2024
            • Received: 15 December 2022

            Check for updates

            Qualifiers

            • survey
          • Article Metrics

            • Downloads (Last 12 months)109
            • Downloads (Last 6 weeks)109

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader