skip to main content
survey

Towards Hybrid-Optimization Video Coding

Published:24 April 2024Publication History
Skip Abstract Section

Abstract

Video coding that pursues the highest compression efficiency is the art of computing for rate-distortion optimization. The optimization has been approached in different ways, exemplified by two typical frameworks: block-based hybrid video coding and end-to-end learned video coding. The block-based hybrid framework encompasses more and more coding modes that are available at the decoder side; an encoder tries to search for the optimal coding mode for each block to be coded. This is an online, discrete, search-based optimization strategy. The end-to-end learned framework embraces more and more sophisticated neural networks; the network parameters are learned from a collection of videos, typically using gradient descent-based methods. This is an offline, continuous, numerical optimization strategy. Having analyzed these two strategies, both conceptually and with concrete schemes, this paper suggests investigating hybrid-optimization video coding, that is to combine online and offline, discrete and continuous, search-based and numerical optimization. For instance, we propose a hybrid-optimization video coding scheme, where the decoder consists of trained neural networks and supports several coding modes, and the encoder adopts both numerical and search-based algorithms for the online optimization. Our scheme achieves promising compression efficiency on par with H.265/HM for the random-access configuration.

REFERENCES

  1. [1] Agustsson Eirikur, Mentzer Fabian, Tschannen Michael, Cavigelli Lukas, Timofte Radu, Benini Luca, and Gool Luc V.. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. In NIPS, Vol. 30. 11411151.Google ScholarGoogle Scholar
  2. [2] Agustsson Eirikur, Minnen David, Johnston Nick, Balle Johannes, Hwang Sung Jin, and Toderici George. 2020. Scale-space flow for end-to-end optimized video compression. In CVPR. 85038512.Google ScholarGoogle ScholarCross RefCross Ref
  3. [3] Agustsson Eirikur and Theis Lucas. 2020. Universally quantized neural compression. In NeurIPS, Vol. 33. 1236712376.Google ScholarGoogle Scholar
  4. [4] Ahmed N., Natarajan T., and Rao K. R.. 1974. Discrete cosine transform. IEEE Trans. Comput. C-23, 1 (1974), 9093.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. [5] Alshin Alexander, Alshina Elena, and Lee Tammy. 2010. Bi-directional optical flow for improving motion compensation. In PCS. IEEE, 422425.Google ScholarGoogle ScholarCross RefCross Ref
  6. [6] Ballé Johannes, Laparra Valero, and Simoncelli Eero P.. 2016. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016).Google ScholarGoogle Scholar
  7. [7] Ballé Johannes, Minnen David, Singh Saurabh, Hwang Sung Jin, and Johnston Nick. 2018. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).Google ScholarGoogle Scholar
  8. [8] Bjontegaard Gisle. 2001. Calculation of Average PSNR Differences between RD-Curves. Technical Report VCEG-M33. VCEG.Google ScholarGoogle Scholar
  9. [9] Bossen Frank. 2011. Common Test Conditions and Software Reference Configurations. Technical Report JCTVC-F900. JCT-VC.Google ScholarGoogle Scholar
  10. [10] Brand Fabian, Fischer Kristian, and Kaup Andre. 2021. Rate-distortion optimized learning-based image compression using an adaptive hierachical autoencoder with conditional hyperprior. In CVPR Workshops. 18851889.Google ScholarGoogle ScholarCross RefCross Ref
  11. [11] Bross Benjamin, Chen Jianle, Ohm Jens-Rainer, Sullivan Gary J., and Wang Ye-Kui. 2021. Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC). Proc. IEEE 109, 9 (2021), 14631493.Google ScholarGoogle ScholarCross RefCross Ref
  12. [12] Bross Benjamin, Wang Ye-Kui, Ye Yan, Liu Shan, Chen Jianle, Sullivan Gary J., and Ohm Jens-Rainer. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 37363764.Google ScholarGoogle ScholarCross RefCross Ref
  13. [13] Cai Jianrui and Zhang Lei. 2018. Deep image compression with iterative non-uniform quantization. In ICIP. IEEE, 451455.Google ScholarGoogle ScholarCross RefCross Ref
  14. [14] Campos Joaquim, Simon Meierhans, Djelouah Abdelaziz, and Schroers Christopher. 2019. Content adaptive optimization for neural image compression. In CVPR Workshops. 15.Google ScholarGoogle Scholar
  15. [15] Chen Mu-Jung, Chen Yi-Hsin, and Peng Wen-Hsiao. 2023. B-CANF: Adaptive B-frame coding with conditional augmented normalizing flows. IEEE Transactions on Circuits and Systems for Video Technology (2023). DOI:Google ScholarGoogle ScholarCross RefCross Ref
  16. [16] Chen O.T.-C.. 2000. Motion estimation using a one-dimensional gradient descent search. IEEE Transactions on Circuits and Systems for Video Technology 10, 4 (2000), 608616.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. [17] Chen Tong, Liu Haojie, Ma Zhan, Shen Qiu, Cao Xun, and Wang Yao. 2021. End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing 30 (2021), 31793191.Google ScholarGoogle ScholarCross RefCross Ref
  18. [18] Cheng Zhengxue, Sun Heming, Takeuchi Masaru, and Katto Jiro. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In CVPR. 79397948.Google ScholarGoogle ScholarCross RefCross Ref
  19. [19] Choi Kiho, Chen Jianle, Rusanovskyy Dmytro, Choi Kwang-Pyo, and Jang Euee S.. 2020. An overview of the MPEG-5 essential video coding standard [standards in a nutshell]. IEEE Signal Processing Magazine 37, 3 (2020), 160167.Google ScholarGoogle ScholarCross RefCross Ref
  20. [20] Choi Yoojin, El-Khamy Mostafa, and Lee Jungwon. 2019. Variable rate deep image compression with a conditional autoencoder. In ICCV. 31463154.Google ScholarGoogle ScholarCross RefCross Ref
  21. [21] Cui Ze, Wang Jing, Gao Shangyin, Guo Tiansheng, Feng Yihui, and Bai Bo. 2021. Asymmetric gained deep image compression with continuous rate adaptation. In CVPR. 1053210541.Google ScholarGoogle ScholarCross RefCross Ref
  22. [22] Djelouah Abdelaziz, Campos Joaquim, Schaub-Meyer Simone, and Schroers Christopher. 2019. Neural inter-frame compression for video coding. In ICCV. 64216429.Google ScholarGoogle ScholarCross RefCross Ref
  23. [23] Dong Chao, Deng Yubin, Loy Chen Change, and Tang Xiaoou. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV. 576584.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. [24] Dufaux Frederic and Konrad Janusz. 2000. Efficient, robust, and fast global motion estimation for video coding. IEEE Transactions on Image Processing 9, 3 (2000), 497501.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. [25] Feng Aolin, Gao Changsheng, Li Li, Liu Dong, and Wu Feng. 2021. CNN-based depth map prediction for fast block partitioning in HEVC intra coding. In ICME. IEEE, 16.Google ScholarGoogle ScholarCross RefCross Ref
  26. [26] Feng Aolin, Liu Kang, Liu Dong, Li Li, and Wu Feng. 2023. Partition map prediction for fast block partitioning in VVC intra-frame coding. IEEE Transactions on Image Processing 32 (2023), 22372251.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. [27] Feng Runsen, Guo Zongyu, Li Weiping, and Chen Zhibo. 2023. NVTC: Nonlinear vector transform coding. In CVPR. 61016110.Google ScholarGoogle ScholarCross RefCross Ref
  28. [28] Gao Chenjian, Xu Tongda, He Dailan, Wang Yan, and Qin Hongwei. 2022. Flexible neural image compression via code editing. In NeurIPS, Vol. 35. 1218412196.Google ScholarGoogle Scholar
  29. [29] Guan Zhenyu, Xing Qunliang, Xu Mai, Yang Ren, Liu Tie, and Wang Zulin. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 949963.Google ScholarGoogle ScholarCross RefCross Ref
  30. [30] Guo Zongyu, Zhang Zhizheng, Feng Runsen, and Chen Zhibo. 2021. Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 23292341.Google ScholarGoogle ScholarCross RefCross Ref
  31. [31] Guo Zongyu, Zhang Zhizheng, Feng Runsen, and Chen Zhibo. 2021. Soft then hard: Rethinking the quantization in neural image compression. In ICML. 39203929.Google ScholarGoogle Scholar
  32. [32] Habibian Amirhossein, Rozendaal Ties van, Tomczak Jakub M., and Cohen Taco S.. 2019. Video compression with rate-distortion autoencoders. In ICCV. 70337042.Google ScholarGoogle ScholarCross RefCross Ref
  33. [33] Han Jingning, Li Bohan, Mukherjee Debargha, Chiang Ching-Han, Grange Adrian, Chen Cheng, Su Hui, Parker Sarah, Deng Sai, Joshi Urvang, Chen Yue, Wang Yunqing, Wilkins Paul, Xu Yaowu, and Bankoski James. 2021. A technical overview of AV1. Proc. IEEE 109, 9 (2021), 14351462.Google ScholarGoogle ScholarCross RefCross Ref
  34. [34] He Dailan, Yang Ziming, Peng Weikun, Ma Rui, Qin Hongwei, and Wang Yan. 2022. ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In CVPR. 57185727.Google ScholarGoogle ScholarCross RefCross Ref
  35. [35] He Dailan, Zheng Yaoyan, Sun Baocheng, Wang Yan, and Qin Hongwei. 2021. Checkerboard context model for efficient learned image compression. In CVPR. 1477114780.Google ScholarGoogle ScholarCross RefCross Ref
  36. [36] Helminger Leonhard, Djelouah Abdelaziz, Gross Markus, and Schroers Christopher. 2020. Lossy image compression with normalizing flows. arXiv preprint arXiv:2008.10486 (2020).Google ScholarGoogle Scholar
  37. [37] Hinton Geoffrey and Salakhutdinov Ruslan. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504507.Google ScholarGoogle ScholarCross RefCross Ref
  38. [38] Ho Yung-Han, Chang Chih-Peng, Chen Peng-Yu, Gnutti Alessandro, and Peng Wen-Hsiao. 2022. CANF-VC: Conditional augmented normalizing flows for video compression. In ECCV. Springer, 207223.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. [39] Hu Yueyu, Yang Wenhan, Ma Zhan, and Liu Jiaying. 2022. Learning end-to-end lossy image compression: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2022), 41944211.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. [40] Hu Zhihao, Chen Zhenghao, Xu Dong, Lu Guo, Ouyang Wanli, and Gu Shuhang. 2020. Improving deep video compression by resolution-adaptive flow coding. In ECCV. Springer, 193209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. [41] Hu Zhihao, Lu Guo, Guo Jinyang, Liu Shan, Jiang Wei, and Xu Dong. 2022. Coarse-to-fine deep video coding with hyperprior-guided mode prediction. In CVPR. 59215930.Google ScholarGoogle ScholarCross RefCross Ref
  42. [42] Hu Zhihao, Lu Guo, and Xu Dong. 2021. FVC: A new framework towards deep video compression in feature space. In CVPR. 15021511.Google ScholarGoogle ScholarCross RefCross Ref
  43. [43] Huo Shuai, Liu Dong, Li Bin, Ma Siwei, Wu Feng, and Gao Wen. 2021. Deep network-based frame extrapolation with reference frame alignment. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2021), 11781192.Google ScholarGoogle ScholarCross RefCross Ref
  44. [44] Huo Shuai, Liu Dong, Wu Feng, and Li Houqiang. 2018. Convolutional neural network-based motion compensation refinement for video coding. In ISCAS. IEEE, 14.Google ScholarGoogle ScholarCross RefCross Ref
  45. [45] ISO/IEC. 1993. ISO/IEC 11172-2 (MPEG-I): Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s - Part 2: Video.Google ScholarGoogle Scholar
  46. [46] ITU-T. 1984. ITU-T Recommendation H.120: Codec for Videoconferencing Using Primary Digital Group Transmission.Google ScholarGoogle Scholar
  47. [47] ITU-T. 1990. ITU-T Recommendation H.261: Video Codec for Audiovisual Services at p \(\times\) 64 kbitis.Google ScholarGoogle Scholar
  48. [48] ITU-T. 1995. ITU-T Recommendation H.263: Video Coding for Low Bitrate Communication.Google ScholarGoogle Scholar
  49. [49] ISO/IEC ITU-T and. 1994. ITU-T Recommendation H.262 - ISO/IEC 13818-2 (MPEG-2): Generic Coding of Moving Pictures and Associated Audio Information - Part 2: Video.Google ScholarGoogle Scholar
  50. [50] Jia Chuanmin, Wang Shiqi, Zhang Xinfeng, Wang Shanshe, Liu Jiaying, Pu Shiliang, and Ma Siwei. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 33433356.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. [51] Jiang Wei, Wang Wei, Li Songnan, and Liu Shan. 2022. Online meta adaptation for variable-rate learned image compression. In CVPR. 498506.Google ScholarGoogle ScholarCross RefCross Ref
  52. [52] Karczewicz Marta, Hu Nan, Taquet Jonathan, Chen Ching-Yeh, Misra Kiran, Andersson Kenneth, Yin Peng, Lu Taoran, François Edouard, and Chen Jie. 2021. VVC in-loop filters. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 39073925.Google ScholarGoogle ScholarCross RefCross Ref
  53. [53] Kim Kyungah and Ro Won Woo. 2018. Fast CU depth decision for HEVC using neural networks. IEEE Transactions on Circuits and Systems for Video Technology 29, 5 (2018), 14621473.Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. [54] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google ScholarGoogle Scholar
  55. [55] Koyuncu A. Burakhan, Gao Han, Boev Atanas, Gaikov Georgii, Alshina Elena, and Steinbach Eckehard. 2022. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In ECCV. Springer, 447463.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. [56] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436444.Google ScholarGoogle ScholarCross RefCross Ref
  57. [57] Lee Jooyoung, Jeong Seyoon, and Kim Munchurl. 2022. Selective compression learning of latent representations for variable-rate image compression. In NeurIPS, Vol. 35. 1314613157.Google ScholarGoogle Scholar
  58. [58] Li Jiahao, Li Bin, and Lu Yan. 2021. Deep contextual video compression. In NeurIPS, Vol. 34. 1811418125.Google ScholarGoogle Scholar
  59. [59] Li Jiahao, Li Bin, and Lu Yan. 2022. Hybrid spatial-temporal entropy modelling for neural video compression. In ACM Multimedia. 15031511.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. [60] Li Jiahao, Li Bin, and Lu Yan. 2023. Neural video compression with diverse contexts. In CVPR. 2261622626.Google ScholarGoogle ScholarCross RefCross Ref
  61. [61] Li Jiahao, Li Bin, Xu Jizheng, Xiong Ruiqin, and Gao Wen. 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 32363247.Google ScholarGoogle ScholarCross RefCross Ref
  62. [62] Li Li, Li Houqiang, Liu Dong, Li Zhu, Yang Haitao, Lin Sixin, Chen Huanbang, and Wu Feng. 2018. An efficient four-parameter affine motion model for video coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 8 (2018), 19341948.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. [63] Li Mu, Zhang Kai, Li Jinxing, Zuo Wangmeng, Timofte Radu, and Zhang David. 2023. Learning context-based nonlocal entropy modeling for image compression. IEEE Transactions on Neural Networks and Learning Systems 34, 3 (2023), 11321145.Google ScholarGoogle ScholarCross RefCross Ref
  64. [64] Li Xin and Orchard Michael T.. 2001. Edge-directed prediction for lossless compression of natural images. IEEE Transactions on Image Processing 10, 6 (2001), 813817.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. [65] Li Yue, Liu Dong, Li Houqiang, Li Li, Li Zhu, and Wu Feng. 2019. Learning a convolutional neural network for image compact-resolution. IEEE Transactions on Image Processing 28, 3 (2019), 10921107.Google ScholarGoogle ScholarCross RefCross Ref
  66. [66] Li Yue, Yi Yan, Liu Dong, Li Li, Li Zhu, and Li Houqiang. 2021. Neural-network-based cross-channel intra prediction. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3, Article 77 (Jul.2021), 23 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. [67] Lin Chih-Hsuan, Chen Yi-Hsin, and Peng Wen-Hsiao. 2022. Content-adaptive motion rate adaption for learned video compression. In PCS. 163167.Google ScholarGoogle ScholarCross RefCross Ref
  68. [68] Lin Jianping, Liu Dong, Li Houqiang, and Wu Feng. 2020. M-LVC: Multiple frames prediction for learned video compression. In CVPR. 35463554.Google ScholarGoogle ScholarCross RefCross Ref
  69. [69] Liu Dong, Chen Zhenzhong, Liu Shan, and Wu Feng. 2020. Deep learning-based technology in responses to the joint call for proposals on video compression with capability beyond HEVC. IEEE Transactions on Circuits and Systems for Video Technology 30, 5 (2020), 12671280.Google ScholarGoogle ScholarCross RefCross Ref
  70. [70] Liu Dong, Li Yue, Lin Jianping, Li Houqiang, and Wu Feng. 2020. Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR) 53, 1 (2020), 135.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. [71] Liu Dong, Ma Haichuan, Xiong Zhiwei, and Wu Feng. 2018. CNN-based DCT-like transform for image compression. In MMM. Springer, 6172.Google ScholarGoogle ScholarCross RefCross Ref
  72. [72] Liu Dong, Sun Xiaoyan, and Wu Feng. 2008. Manipulating image patches for compression. In ICME. 197200.Google ScholarGoogle ScholarCross RefCross Ref
  73. [73] Liu H., Chen Y., Chen J., Zhang L., and Karczewicz M.. 2015. Local Illumination Compensation. Technical Report VCEG-AZ06. VCEG.Google ScholarGoogle Scholar
  74. [74] Liu Haojie, Lu Ming, Ma Zhan, Wang Fan, Xie Zhihuang, Cao Xun, and Wang Yao. 2021. Neural video coding using multiscale motion compensation and spatiotemporal context model. IEEE Transactions on Circuits and Systems for Video Technology 31, 8 (2021), 31823196.Google ScholarGoogle ScholarCross RefCross Ref
  75. [75] Liu Jiaying, Liu Dong, Yang Wenhan, Xia Sifeng, Zhang Xiaoshuai, and Dai Yuanying. 2020. A comprehensive benchmark for single image compression artifact reduction. IEEE Transactions on Image Processing 29 (2020), 78457860.Google ScholarGoogle ScholarCross RefCross Ref
  76. [76] Liu Jinming, Sun Heming, and Katto Jiro. 2023. Learned image compression with mixed transformer-CNN architectures. In CVPR. 1438814397.Google ScholarGoogle ScholarCross RefCross Ref
  77. [77] Liu Jerry, Wang Shenlong, Ma Wei-Chiu, Shah Meet, Hu Rui, Dhawan Pranaab, and Urtasun Raquel. 2020. Conditional entropy coding for efficient video compression. In ECCV. Springer, 453468.Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. [78] Liu Kang, Liu Dong, Li Li, and Li Houqiang. 2021. Context-adaptive inverse quantization for inter-frame coding. IEEE Open Journal of Circuits and Systems 2 (2021), 660674.Google ScholarGoogle ScholarCross RefCross Ref
  79. [79] Liu Lurng-Kuo and Feig Ephraim. 1996. A block-based gradient descent search algorithm for block motion estimation in video coding. IEEE Transactions on Circuits and Systems for Video Technology 6, 4 (1996), 419422.Google ScholarGoogle ScholarDigital LibraryDigital Library
  80. [80] Liu Zheng, Li Tianyi, Chen Ying, Wei Kaijin, Xu Mai, and Qi Honggang. 2023. Deep multi-task learning based fast intra-mode decision for versatile video coding. IEEE Transactions on Circuits and Systems for Video Technology 33, 10 (2023), 61016116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  81. [81] Liu Zhenyu, Yu Xianyu, Gao Yuan, Chen Shaolin, Ji Xiangyang, and Wang Dongsheng. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Transactions on Image Processing 25, 11 (2016), 50885103.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. [82] Lu Guo, Cai Chunlei, Zhang Xiaoyun, Chen Li, Ouyang Wanli, Xu Dong, and Gao Zhiyong. 2020. Content adaptive and error propagation aware deep video compression. In ECCV. 456472.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. [83] Lu Guo, Ouyang Wanli, Xu Dong, Zhang Xiaoyun, Cai Chunlei, and Gao Zhiyong. 2019. DVC: An end-to-end deep video compression framework. In CVPR. 1100611015.Google ScholarGoogle ScholarCross RefCross Ref
  84. [84] Ma Changyue, Liu Dong, Peng Xiulian, Li Li, and Wu Feng. 2019. Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 19011916.Google ScholarGoogle Scholar
  85. [85] Ma Haichuan, Liu Dong, Dong Cunhui, Li Li, and Wu Feng. 2021. End-to-end image compression with probabilistic decoding. arXiv preprint arXiv:2109.14837 (2021).Google ScholarGoogle Scholar
  86. [86] Ma Haichuan, Liu Dong, and Wu Feng. 2020. Improving compression artifact reduction via end-to-end learning of side information. In VCIP. 403406.Google ScholarGoogle ScholarCross RefCross Ref
  87. [87] Ma Haichuan, Liu Dong, Xiong Ruiqin, and Wu Feng. 2019. iWave: CNN-based wavelet-like transform for image compression. IEEE Transactions on Multimedia 22, 7 (2019), 16671679.Google ScholarGoogle ScholarCross RefCross Ref
  88. [88] Ma Haichuan, Liu Dong, Yan Ning, Li Houqiang, and Wu Feng. 2022. End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022), 12471263.Google ScholarGoogle ScholarCross RefCross Ref
  89. [89] Meardi Guido, Ferrara Simone, Ciccarelli Lorenzo, Cobianchi Guendalina, Poularakis Stergios, Maurer Florian, Battista Stefano, and Byagowi Ahmad. 2020. MPEG-5 Part 2: Low complexity enhancement video coding (LCEVC): Overview and performance evaluation. In Applications of Digital Image Processing XLIII, Vol. 11510. International Society for Optics and Photonics, 115101C.Google ScholarGoogle Scholar
  90. [90] Mentzer Fabian, Agustsson Eirikur, Tschannen Michael, Timofte Radu, and Gool Luc Van. 2018. Conditional probability models for deep image compression. In CVPR. 43944402.Google ScholarGoogle ScholarCross RefCross Ref
  91. [91] Mentzer Fabian, Toderici George D., Minnen David, Caelles Sergi, Hwang Sung Jin, Lucic Mario, and Agustsson Eirikur. 2022. VCT: A video compression transformer. In NeurIPS, Vol. 35. 1309113103.Google ScholarGoogle Scholar
  92. [92] Minnen David, Ballé Johannes, and Toderici George. 2018. Joint autoregressive and hierarchical priors for learned image compression. In NIPS, Vol. 31. 1079410803.Google ScholarGoogle Scholar
  93. [93] Minnen David and Singh Saurabh. 2020. Channel-wise autoregressive entropy models for learned image compression. In ICIP. IEEE, 33393343.Google ScholarGoogle ScholarCross RefCross Ref
  94. [94] Nocedal Jorge and Wright Stephen. 2006. Numerical Optimization. Springer Science & Business Media.Google ScholarGoogle Scholar
  95. [95] Ohm Jens-Rainer, Sullivan Gary J., Schwarz Heiko, Tan Thiow Keng, and Wiegand Thomas. 2012. Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 16691684.Google ScholarGoogle ScholarDigital LibraryDigital Library
  96. [96] Ortega Antonio and Ramchandran Kannan. 1998. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine 15, 6 (1998), 2350.Google ScholarGoogle ScholarCross RefCross Ref
  97. [97] Pan Guanbo, Lu Guo, Hu Zhihao, and Xu Dong. 2022. Content adaptive latents and decoder for neural image compression. In ECCV. Springer, 556573.Google ScholarGoogle ScholarDigital LibraryDigital Library
  98. [98] Peng Wen-Hsiao, Walls Frederick G., Cohen Robert A., Xu Jizheng, Ostermann Jörn, MacInnis Alexander, and Lin Tao. 2016. Overview of screen content video coding: Technologies, standards, and beyond. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 4 (2016), 393408.Google ScholarGoogle ScholarCross RefCross Ref
  99. [99] Pfaff J., Helle P., Maniry D., Kaltenstadler S., Samek W., Schwarz H., Marpe D., and Wiegand T.. 2018. Neural network based intra prediction for video coding. In Applications of Digital Image Processing XLI, Vol. 10752. International Society for Optics and Photonics, 1075213.Google ScholarGoogle ScholarCross RefCross Ref
  100. [100] Po Lai-Man, Ng Ka-Ho, Cheung Kwok-Wai, Wong Ka-Man, Uddin Yusuf Md. Salah, and Ting Chi-Wang. 2009. Novel directional gradient descent searches for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 19, 8 (2009), 11891195.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. [101] Qian Yichen, Lin Ming, Sun Xiuyu, Tan Zhiyu, and Jin Rong. 2022. Entroformer: A transformer-based entropy model for learned image compression. arXiv preprint arXiv:2202.05492 (2022).Google ScholarGoogle Scholar
  102. [102] Rippel Oren, Anderson Alexander G., Tatwawadi Kedar, Nair Sanjay, Lytle Craig, and Bourdev Lubomir. 2021. ELF-VC: Efficient learned flexible-rate video coding. In ICCV. 1447914488.Google ScholarGoogle ScholarCross RefCross Ref
  103. [103] Rippel Oren, Nair Sanjay, Lew Carissa, Branson Steve, Anderson Alexander G., and Bourdev Lubomir. 2019. Learned video compression. In ICCV. 34543463.Google ScholarGoogle ScholarCross RefCross Ref
  104. [104] Shannon C. E.. 1948. A mathematical theory of communication. Bell Systems Technical Journal 27, 4 (1948), 623656.Google ScholarGoogle ScholarCross RefCross Ref
  105. [105] Shannon C. E.. 1959. Coding theorems for a discrete source with a fidelity criteria. International Convention Record 7 (1959), 325350.Google ScholarGoogle Scholar
  106. [106] Sheng Xihua, Li Jiahao, Li Bin, Li Li, Liu Dong, and Lu Yan. 2023. Temporal context mining for learned video compression. IEEE Transactions on Multimedia 25 (2023), 73117322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  107. [107] Shi Yibo, Ge Yunying, Wang Jing, and Mao Jue. 2022. AlphaVC: High-performance and efficient learned video compression. In ECCV. Springer, 616631.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. [108] Sikora Thomas. 2005. Trends and perspectives in image and video coding. Proc. IEEE 93, 1 (2005), 617.Google ScholarGoogle ScholarCross RefCross Ref
  109. [109] Song Li, Tang Xun, Zhang Wei, Yang Xiaokang, and Xia Pingjian. 2013. The SJTU 4K video sequence dataset. In QoMEX. 3435.Google ScholarGoogle ScholarCross RefCross Ref
  110. [110] Song Myungseo, Choi Jinyoung, and Han Bohyung. 2021. Variable-rate deep image compression through spatially-adaptive feature transform. In ICCV. 23602369.Google ScholarGoogle ScholarCross RefCross Ref
  111. [111] Storch Iago, Agostini Luciano, Zatt Bruno, Bampi Sergio, and Palomino Daniel. 2022. FastInter360: A fast inter mode decision for HEVC 360 video coding. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2022), 32353249.Google ScholarGoogle ScholarDigital LibraryDigital Library
  112. [112] Sullivan Gary J., Ohm Jens, Han Woo-Jin, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 16491668.Google ScholarGoogle ScholarDigital LibraryDigital Library
  113. [113] Sullivan Gary J. and Wiegand Thomas. 1998. Rate-distortion optimization for video compression. IEEE Signal Processing Magazine 15, 6 (1998), 7490.Google ScholarGoogle ScholarCross RefCross Ref
  114. [114] Sun Heming, Yu Lu, and Katto Jiro. 2022. Improving latent quantization of learned image compression with gradient scaling. In VCIP. 15.Google ScholarGoogle ScholarCross RefCross Ref
  115. [115] Tang Zhisen, Wang Hanli, Yi Xiaokai, Zhang Yun, Kwong Sam, and Kuo C.-C. Jay. 2022. Joint graph attention and asymmetric convolutional neural network for deep image compression. IEEE Transactions on Circuits and Systems for Video Technology 33, 1 (2022), 421433.Google ScholarGoogle ScholarCross RefCross Ref
  116. [116] Theis Lucas and Agustsson Eirikur. 2021. On the advantages of stochastic encoders. arXiv preprint arXiv:2102.09270 (2021).Google ScholarGoogle Scholar
  117. [117] Theis Lucas, Shi Wenzhe, Cunningham Andrew, and Huszár Ferenc. 2017. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017).Google ScholarGoogle Scholar
  118. [118] Toderici George, O’Malley Sean M., Hwang Sung Jin, Vincent Damien, Minnen David, Baluja Shumeet, Covell Michele, and Sukthankar Rahul. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).Google ScholarGoogle Scholar
  119. [119] Tsai Chia-Yang, Chen Ching-Yeh, Yamakage Tomoo, Chong In Suk, Huang Yu-Wen, Fu Chih-Ming, Itoh Takayuki, Watanabe Takashi, Chujoh Takeshi, Karczewicz Marta, and Lei Shaw-Min. 2013. Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 934945.Google ScholarGoogle ScholarCross RefCross Ref
  120. [120] Rozendaal Ties van, Brehmer Johann, Zhang Yunfan, Pourreza Reza, and Cohen Taco S.. 2021. Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set. arXiv preprint arXiv:2111.10302 (2021).Google ScholarGoogle Scholar
  121. [121] Vatis Yuri and Ostermann Joern. 2008. Adaptive interpolation filter for H. 264/AVC. IEEE Transactions on Circuits and Systems for Video Technology 19, 2 (2008), 179192.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. [122] Wallace Gregory K.. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.Google ScholarGoogle ScholarDigital LibraryDigital Library
  123. [123] Wang Dezhao, Yang Wenhan, Hu Yueyu, and Liu Jiaying. 2022. Neural data-dependent transform for learned image compression. In CVPR. 1737917388.Google ScholarGoogle ScholarCross RefCross Ref
  124. [124] Wang Xiao, Ding Ding, Jiang Wei, Wang Wei, Xu Xiaozhong, Liu Shan, Kulis Brian, and Chin Peter. 2022. Substitutional neural image compression. In PCS. 97101.Google ScholarGoogle ScholarCross RefCross Ref
  125. [125] Wang Yefei, Liu Dong, Ma Siwei, Wu Feng, and Gao Wen. 2020. Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 11931207.Google ScholarGoogle ScholarCross RefCross Ref
  126. [126] Wang Yao, Ostermann Jörn, and Zhang Ya-Qin. 2002. Video Processing and Communications. Vol. 1. Prentice Hall Upper Saddle River, NJ.Google ScholarGoogle Scholar
  127. [127] Wedi Thomas. 2006. Adaptive interpolation filters and high-resolution displacements for video coding. IEEE Transactions on Circuits and Systems for Video Technology 16, 4 (2006), 484491.Google ScholarGoogle ScholarDigital LibraryDigital Library
  128. [128] Wiegand Thomas, Schwarz Heiko, Joch Anthony, Kossentini Faouzi, and Sullivan Gary J.. 2003. Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 688703.Google ScholarGoogle ScholarDigital LibraryDigital Library
  129. [129] Wiegand Thomas, Sullivan Gary J., Bjontegaard Gisle, and Luthra Ajay. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560576.Google ScholarGoogle ScholarDigital LibraryDigital Library
  130. [130] Wu Xiaolin, Barthel E. U., and Zhang Wenhan. 1998. Piecewise 2D autoregression for predictive image coding. In ICIP. IEEE, 901904.Google ScholarGoogle Scholar
  131. [131] Xie Yueqi, Cheng Ka Leong, and Chen Qifeng. 2021. Enhanced invertible encoding for learned image compression. In ACM Multimedia. 162170.Google ScholarGoogle ScholarDigital LibraryDigital Library
  132. [132] Xu Mai, Li Tianyi, Wang Zulin, Deng Xin, Yang Ren, and Guan Zhenyu. 2018. Reducing complexity of HEVC: A deep learning approach. IEEE Transactions on Image Processing 27, 10 (2018), 50445059.Google ScholarGoogle ScholarCross RefCross Ref
  133. [133] Xu Tongda, Gao Han, Gao Chenjian, Wang Yuanyuan, He Dailan, Pi Jinyong, Luo Jixiang, Zhu Ziyu, Ye Mao, Qin Hongwei, Wang Yan, Liu Jingjing, and Zhang Ya-Qin. 2023. Bit allocation using optimization. In ICML. 3837738399.Google ScholarGoogle Scholar
  134. [134] Yan Ning, Liu Dong, Li Houqiang, Li Bin, Li Li, and Wu Feng. 2019. Invertibility-driven interpolation filter for video coding. IEEE Transactions on Image Processing 28, 10 (2019), 49124925.Google ScholarGoogle ScholarCross RefCross Ref
  135. [135] Yang Kun, Liu Dong, and Wu Feng. 2020. Deep learning-based nonlinear transform for HEVC intra coding. In VCIP. 387390.Google ScholarGoogle ScholarCross RefCross Ref
  136. [136] Yang Runyu, Liu Dong, Ma Siwei, Wu Feng, and Gao Wen. 2021. Knowledge distillation from end-to-end image compression to VVC intra coding for perceptual quality enhancement. In ICIP. 34383442.Google ScholarGoogle ScholarCross RefCross Ref
  137. [137] Yang Ren, Mentzer Fabian, Gool Luc Van, and Timofte Radu. 2020. Learning for video compression with hierarchical quality and recurrent enhancement. In CVPR. 66286637.Google ScholarGoogle ScholarCross RefCross Ref
  138. [138] Yang Ren, Mentzer Fabian, Gool Luc Van, and Timofte Radu. 2020. Learning for video compression with recurrent auto-encoder and recurrent probability model. IEEE Journal of Selected Topics in Signal Processing 15, 2 (2020), 388401.Google ScholarGoogle ScholarCross RefCross Ref
  139. [139] Yang Yibo, Bamler Robert, and Mandt Stephan. 2020. Improving inference for neural image compression. In NeurIPS, Vol. 33. 573584.Google ScholarGoogle Scholar
  140. [140] Ye Hua, Deng Guang, and Devlin John C.. 1999. Least squares approach for lossless image coding. In International Symposium on Signal Processing and its Applications (ISSPA), Vol. 1. IEEE, 6366.Google ScholarGoogle ScholarCross RefCross Ref
  141. [141] Yuan Hui, Chang Yilin, Lu Zhaoyang, and Ma Yanzhuo. 2010. Model based motion vector predictor for zoom motion. IEEE Signal Processing Letters 17, 9 (2010), 787790.Google ScholarGoogle ScholarCross RefCross Ref
  142. [142] Yuan Hui, Liu Ju, Sun Jiande, Liu Hechao, and Li Yujun. 2012. Affine model based motion compensation prediction for zoom. IEEE Transactions on Multimedia 14, 4 (2012), 13701375.Google ScholarGoogle ScholarDigital LibraryDigital Library
  143. [143] Yılmaz M. Akın and Tekalp A. Murat. 2022. End-to-end rate-distortion optimized learned hierarchical bi-directional video compression. IEEE Transactions on Image Processing 31 (2022), 974983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  144. [144] Zhang Honglei, Cricri Francesco, Tavakoli Hamed Rezazadegan, Santamaria Maria, Lam Yat-Hong, and Hannuksela Miska M.. 2021. Learn to overfit better: Finding the important parameters for learned image compression. In VCIP. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  145. [145] Zhang Jiaqi, Jia Chuanmin, Lei Meng, Wang Shanshe, Ma Siwei, and Gao Wen. 2019. Recent development of AVS video coding standard: AVS3. In PCS. IEEE, 15.Google ScholarGoogle ScholarCross RefCross Ref
  146. [146] Zhang Kai, Chen Jianle, Zhang Li, Li Xiang, and Karczewicz Marta. 2018. Enhanced cross-component linear model for chroma intra-prediction in video coding. IEEE Transactions on Image Processing 27, 8 (2018), 39833997.Google ScholarGoogle ScholarCross RefCross Ref
  147. [147] Zhang Xi and Wu Xiaolin. 2023. LVQAC: Lattice vector quantization coupled with spatially adaptive companding for efficient learned image compression. In CVPR. 1023910248.Google ScholarGoogle ScholarCross RefCross Ref
  148. [148] Zhang Ziqiu, Ma Changyue, Liu Dong, Li Li, and Wu Feng. 2021. Improving VVC intra coding via probability estimation and fusion of multiple prediction modes. In ICIG. Springer, 654664.Google ScholarGoogle ScholarDigital LibraryDigital Library
  149. [149] Zhao Jing, Li Bin, Li Jiahao, Xiong Ruiqin, and Lu Yan. 2021. A universal encoder rate distortion optimization framework for learned compression. In CVPR. 18801884.Google ScholarGoogle ScholarCross RefCross Ref
  150. [150] Zhao Zhenghui, Wang Shiqi, Wang Shanshe, Zhang Xinfeng, Ma Siwei, and Yang Jiansheng. 2019. Enhanced bi-prediction with convolutional neural network for high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology 29, 11 (2019), 32913301.Google ScholarGoogle ScholarDigital LibraryDigital Library
  151. [151] Zhong Zhisheng, Akutsu Hiroaki, and Aizawa Kiyoharu. 2020. Channel-level variable quantization network for deep image compression. In IJCAI. 467473.Google ScholarGoogle ScholarCross RefCross Ref
  152. [152] Zhu Xiaosu, Song Jingkuan, Gao Lianli, Zheng Feng, and Shen Heng Tao. 2022. Unified multivariate Gaussian mixture for efficient neural image compression. In CVPR. 1761217621.Google ScholarGoogle ScholarCross RefCross Ref
  153. [153] Zhu Yinhao, Yang Yang, and Cohen Taco. 2022. Transformer-based transform coding. In ICLR. https://openreview.net/forum?id=IDwN6xjHnK8Google ScholarGoogle Scholar
  154. [154] Zou Nannan, Zhang Honglei, Cricri Francesco, Tavakoli Hamed R., Lainema Jani, Hannuksela Miska, Aksu Emre, and Rahtu Esa. 2020. L2C – learning to learn to compress. In MMSP. 16.Google ScholarGoogle Scholar
  155. [155] Zou Renjie, Song Chunfeng, and Zhang Zhaoxiang. 2022. The devil is in the details: Window-based attention for image compression. In CVPR. 1749217501.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Towards Hybrid-Optimization Video Coding

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Computing Surveys
      ACM Computing Surveys  Volume 56, Issue 9
      September 2024
      980 pages
      ISSN:0360-0300
      EISSN:1557-7341
      DOI:10.1145/3613649
      • Editors:
      • David Atienza,
      • Michela Milano
      Issue’s Table of Contents

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 April 2024
      • Online AM: 11 March 2024
      • Accepted: 7 March 2024
      • Revised: 19 February 2024
      • Received: 29 August 2022
      Published in csur Volume 56, Issue 9

      Check for updates

      Qualifiers

      • survey
    • Article Metrics

      • Downloads (Last 12 months)185
      • Downloads (Last 6 weeks)85

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    View Full Text