survey

Towards Hybrid-Optimization Video Coding

Authors:
Shuai Huo

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0003-3687-6699
View Profile

,
Dong Liu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0001-9100-2906
View Profile

,
Haotian Zhang

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0009-0004-7193-9127
View Profile

,
Li Li

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0002-7163-6263
View Profile

,
Siwei Ma

Peking University, Beijing, China

Peking University, Beijing, China

0000-0002-2731-5403
View Profile

,
Feng Wu

University of Science and Technology of China, Hefei, China

University of Science and Technology of China, Hefei, China

0000-0001-8451-0881
View Profile

,
Wen Gao

Peng Cheng Laboratory, Shenzhen, China and Peking University, Beijing, China

Peng Cheng Laboratory, Shenzhen, China and Peking University, Beijing, China

0000-0001-8894-1806
View Profile

Authors Info & Claims

ACM Computing Surveys Volume 56 Issue 9Article No.: 225pp 1–36https://doi.org/10.1145/3652148

Published:24 April 2024Publication History

ACM Computing Surveys

Abstract

Video coding that pursues the highest compression efficiency is the art of computing for rate-distortion optimization. The optimization has been approached in different ways, exemplified by two typical frameworks: block-based hybrid video coding and end-to-end learned video coding. The block-based hybrid framework encompasses more and more coding modes that are available at the decoder side; an encoder tries to search for the optimal coding mode for each block to be coded. This is an online, discrete, search-based optimization strategy. The end-to-end learned framework embraces more and more sophisticated neural networks; the network parameters are learned from a collection of videos, typically using gradient descent-based methods. This is an offline, continuous, numerical optimization strategy. Having analyzed these two strategies, both conceptually and with concrete schemes, this paper suggests investigating hybrid-optimization video coding, that is to combine online and offline, discrete and continuous, search-based and numerical optimization. For instance, we propose a hybrid-optimization video coding scheme, where the decoder consists of trained neural networks and supports several coding modes, and the encoder adopts both numerical and search-based algorithms for the online optimization. Our scheme achieves promising compression efficiency on par with H.265/HM for the random-access configuration.

REFERENCES

[1] Agustsson Eirikur, Mentzer Fabian, Tschannen Michael, Cavigelli Lukas, Timofte Radu, Benini Luca, and Gool Luc V.. 2017. Soft-to-hard vector quantization for end-to-end learning compressible representations. In NIPS, Vol. 30. 1141–1151.Google Scholar
[2] Agustsson Eirikur, Minnen David, Johnston Nick, Balle Johannes, Hwang Sung Jin, and Toderici George. 2020. Scale-space flow for end-to-end optimized video compression. In CVPR. 8503–8512.Google ScholarCross Ref
[3] Agustsson Eirikur and Theis Lucas. 2020. Universally quantized neural compression. In NeurIPS, Vol. 33. 12367–12376.Google Scholar
[4] Ahmed N., Natarajan T., and Rao K. R.. 1974. Discrete cosine transform. IEEE Trans. Comput. C-23, 1 (1974), 90–93.Google ScholarDigital Library
[5] Alshin Alexander, Alshina Elena, and Lee Tammy. 2010. Bi-directional optical flow for improving motion compensation. In PCS. IEEE, 422–425.Google ScholarCross Ref
[6] Ballé Johannes, Laparra Valero, and Simoncelli Eero P.. 2016. End-to-end optimized image compression. arXiv preprint arXiv:1611.01704 (2016).Google Scholar
[7] Ballé Johannes, Minnen David, Singh Saurabh, Hwang Sung Jin, and Johnston Nick. 2018. Variational image compression with a scale hyperprior. arXiv preprint arXiv:1802.01436 (2018).Google Scholar
[8] Bjontegaard Gisle. 2001. Calculation of Average PSNR Differences between RD-Curves. Technical Report VCEG-M33. VCEG.Google Scholar
[9] Bossen Frank. 2011. Common Test Conditions and Software Reference Configurations. Technical Report JCTVC-F900. JCT-VC.Google Scholar
[10] Brand Fabian, Fischer Kristian, and Kaup Andre. 2021. Rate-distortion optimized learning-based image compression using an adaptive hierachical autoencoder with conditional hyperprior. In CVPR Workshops. 1885–1889.Google ScholarCross Ref
[11] Bross Benjamin, Chen Jianle, Ohm Jens-Rainer, Sullivan Gary J., and Wang Ye-Kui. 2021. Developments in international video coding standardization after AVC, with an overview of versatile video coding (VVC). Proc. IEEE 109, 9 (2021), 1463–1493.Google ScholarCross Ref
[12] Bross Benjamin, Wang Ye-Kui, Ye Yan, Liu Shan, Chen Jianle, Sullivan Gary J., and Ohm Jens-Rainer. 2021. Overview of the versatile video coding (VVC) standard and its applications. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3736–3764.Google ScholarCross Ref
[13] Cai Jianrui and Zhang Lei. 2018. Deep image compression with iterative non-uniform quantization. In ICIP. IEEE, 451–455.Google ScholarCross Ref
[14] Campos Joaquim, Simon Meierhans, Djelouah Abdelaziz, and Schroers Christopher. 2019. Content adaptive optimization for neural image compression. In CVPR Workshops. 1–5.Google Scholar
[15] Chen Mu-Jung, Chen Yi-Hsin, and Peng Wen-Hsiao. 2023. B-CANF: Adaptive B-frame coding with conditional augmented normalizing flows. IEEE Transactions on Circuits and Systems for Video Technology (2023). DOI:Google ScholarCross Ref
[16] Chen O.T.-C.. 2000. Motion estimation using a one-dimensional gradient descent search. IEEE Transactions on Circuits and Systems for Video Technology 10, 4 (2000), 608–616.Google ScholarDigital Library
[17] Chen Tong, Liu Haojie, Ma Zhan, Shen Qiu, Cao Xun, and Wang Yao. 2021. End-to-end learnt image compression via non-local attention optimization and improved context modeling. IEEE Transactions on Image Processing 30 (2021), 3179–3191.Google ScholarCross Ref
[18] Cheng Zhengxue, Sun Heming, Takeuchi Masaru, and Katto Jiro. 2020. Learned image compression with discretized gaussian mixture likelihoods and attention modules. In CVPR. 7939–7948.Google ScholarCross Ref
[19] Choi Kiho, Chen Jianle, Rusanovskyy Dmytro, Choi Kwang-Pyo, and Jang Euee S.. 2020. An overview of the MPEG-5 essential video coding standard [standards in a nutshell]. IEEE Signal Processing Magazine 37, 3 (2020), 160–167.Google ScholarCross Ref
[20] Choi Yoojin, El-Khamy Mostafa, and Lee Jungwon. 2019. Variable rate deep image compression with a conditional autoencoder. In ICCV. 3146–3154.Google ScholarCross Ref
[21] Cui Ze, Wang Jing, Gao Shangyin, Guo Tiansheng, Feng Yihui, and Bai Bo. 2021. Asymmetric gained deep image compression with continuous rate adaptation. In CVPR. 10532–10541.Google ScholarCross Ref
[22] Djelouah Abdelaziz, Campos Joaquim, Schaub-Meyer Simone, and Schroers Christopher. 2019. Neural inter-frame compression for video coding. In ICCV. 6421–6429.Google ScholarCross Ref
[23] Dong Chao, Deng Yubin, Loy Chen Change, and Tang Xiaoou. 2015. Compression artifacts reduction by a deep convolutional network. In ICCV. 576–584.Google ScholarDigital Library
[24] Dufaux Frederic and Konrad Janusz. 2000. Efficient, robust, and fast global motion estimation for video coding. IEEE Transactions on Image Processing 9, 3 (2000), 497–501.Google ScholarDigital Library
[25] Feng Aolin, Gao Changsheng, Li Li, Liu Dong, and Wu Feng. 2021. CNN-based depth map prediction for fast block partitioning in HEVC intra coding. In ICME. IEEE, 1–6.Google ScholarCross Ref
[26] Feng Aolin, Liu Kang, Liu Dong, Li Li, and Wu Feng. 2023. Partition map prediction for fast block partitioning in VVC intra-frame coding. IEEE Transactions on Image Processing 32 (2023), 2237–2251.Google ScholarDigital Library
[27] Feng Runsen, Guo Zongyu, Li Weiping, and Chen Zhibo. 2023. NVTC: Nonlinear vector transform coding. In CVPR. 6101–6110.Google ScholarCross Ref
[28] Gao Chenjian, Xu Tongda, He Dailan, Wang Yan, and Qin Hongwei. 2022. Flexible neural image compression via code editing. In NeurIPS, Vol. 35. 12184–12196.Google Scholar
[29] Guan Zhenyu, Xing Qunliang, Xu Mai, Yang Ren, Liu Tie, and Wang Zulin. 2019. MFQE 2.0: A new approach for multi-frame quality enhancement on compressed video. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 949–963.Google ScholarCross Ref
[30] Guo Zongyu, Zhang Zhizheng, Feng Runsen, and Chen Zhibo. 2021. Causal contextual prediction for learned image compression. IEEE Transactions on Circuits and Systems for Video Technology 32, 4 (2021), 2329–2341.Google ScholarCross Ref
[31] Guo Zongyu, Zhang Zhizheng, Feng Runsen, and Chen Zhibo. 2021. Soft then hard: Rethinking the quantization in neural image compression. In ICML. 3920–3929.Google Scholar
[32] Habibian Amirhossein, Rozendaal Ties van, Tomczak Jakub M., and Cohen Taco S.. 2019. Video compression with rate-distortion autoencoders. In ICCV. 7033–7042.Google ScholarCross Ref
[33] Han Jingning, Li Bohan, Mukherjee Debargha, Chiang Ching-Han, Grange Adrian, Chen Cheng, Su Hui, Parker Sarah, Deng Sai, Joshi Urvang, Chen Yue, Wang Yunqing, Wilkins Paul, Xu Yaowu, and Bankoski James. 2021. A technical overview of AV1. Proc. IEEE 109, 9 (2021), 1435–1462.Google ScholarCross Ref
[34] He Dailan, Yang Ziming, Peng Weikun, Ma Rui, Qin Hongwei, and Wang Yan. 2022. ELIC: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding. In CVPR. 5718–5727.Google ScholarCross Ref
[35] He Dailan, Zheng Yaoyan, Sun Baocheng, Wang Yan, and Qin Hongwei. 2021. Checkerboard context model for efficient learned image compression. In CVPR. 14771–14780.Google ScholarCross Ref
[36] Helminger Leonhard, Djelouah Abdelaziz, Gross Markus, and Schroers Christopher. 2020. Lossy image compression with normalizing flows. arXiv preprint arXiv:2008.10486 (2020).Google Scholar
[37] Hinton Geoffrey and Salakhutdinov Ruslan. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504–507.Google ScholarCross Ref
[38] Ho Yung-Han, Chang Chih-Peng, Chen Peng-Yu, Gnutti Alessandro, and Peng Wen-Hsiao. 2022. CANF-VC: Conditional augmented normalizing flows for video compression. In ECCV. Springer, 207–223.Google ScholarDigital Library
[39] Hu Yueyu, Yang Wenhan, Ma Zhan, and Liu Jiaying. 2022. Learning end-to-end lossy image compression: A benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 8 (2022), 4194–4211.Google ScholarDigital Library
[40] Hu Zhihao, Chen Zhenghao, Xu Dong, Lu Guo, Ouyang Wanli, and Gu Shuhang. 2020. Improving deep video compression by resolution-adaptive flow coding. In ECCV. Springer, 193–209.Google ScholarDigital Library
[41] Hu Zhihao, Lu Guo, Guo Jinyang, Liu Shan, Jiang Wei, and Xu Dong. 2022. Coarse-to-fine deep video coding with hyperprior-guided mode prediction. In CVPR. 5921–5930.Google ScholarCross Ref
[42] Hu Zhihao, Lu Guo, and Xu Dong. 2021. FVC: A new framework towards deep video compression in feature space. In CVPR. 1502–1511.Google ScholarCross Ref
[43] Huo Shuai, Liu Dong, Li Bin, Ma Siwei, Wu Feng, and Gao Wen. 2021. Deep network-based frame extrapolation with reference frame alignment. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2021), 1178–1192.Google ScholarCross Ref
[44] Huo Shuai, Liu Dong, Wu Feng, and Li Houqiang. 2018. Convolutional neural network-based motion compensation refinement for video coding. In ISCAS. IEEE, 1–4.Google ScholarCross Ref
[45] ISO/IEC. 1993. ISO/IEC 11172-2 (MPEG-I): Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to About 1.5 Mbit/s - Part 2: Video.Google Scholar
[46] ITU-T. 1984. ITU-T Recommendation H.120: Codec for Videoconferencing Using Primary Digital Group Transmission.Google Scholar
[47] ITU-T. 1990. ITU-T Recommendation H.261: Video Codec for Audiovisual Services at p \(\times\) 64 kbitis.Google Scholar
[48] ITU-T. 1995. ITU-T Recommendation H.263: Video Coding for Low Bitrate Communication.Google Scholar
[49] ISO/IEC ITU-T and. 1994. ITU-T Recommendation H.262 - ISO/IEC 13818-2 (MPEG-2): Generic Coding of Moving Pictures and Associated Audio Information - Part 2: Video.Google Scholar
[50] Jia Chuanmin, Wang Shiqi, Zhang Xinfeng, Wang Shanshe, Liu Jiaying, Pu Shiliang, and Ma Siwei. 2019. Content-aware convolutional neural network for in-loop filtering in high efficiency video coding. IEEE Transactions on Image Processing 28, 7 (2019), 3343–3356.Google ScholarDigital Library
[51] Jiang Wei, Wang Wei, Li Songnan, and Liu Shan. 2022. Online meta adaptation for variable-rate learned image compression. In CVPR. 498–506.Google ScholarCross Ref
[52] Karczewicz Marta, Hu Nan, Taquet Jonathan, Chen Ching-Yeh, Misra Kiran, Andersson Kenneth, Yin Peng, Lu Taoran, François Edouard, and Chen Jie. 2021. VVC in-loop filters. IEEE Transactions on Circuits and Systems for Video Technology 31, 10 (2021), 3907–3925.Google ScholarCross Ref
[53] Kim Kyungah and Ro Won Woo. 2018. Fast CU depth decision for HEVC using neural networks. IEEE Transactions on Circuits and Systems for Video Technology 29, 5 (2018), 1462–1473.Google ScholarDigital Library
[54] Kingma Diederik P. and Ba Jimmy. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).Google Scholar
[55] Koyuncu A. Burakhan, Gao Han, Boev Atanas, Gaikov Georgii, Alshina Elena, and Steinbach Eckehard. 2022. Contextformer: A transformer with spatio-channel attention for context modeling in learned image compression. In ECCV. Springer, 447–463.Google ScholarDigital Library
[56] LeCun Yann, Bengio Yoshua, and Hinton Geoffrey. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.Google ScholarCross Ref
[57] Lee Jooyoung, Jeong Seyoon, and Kim Munchurl. 2022. Selective compression learning of latent representations for variable-rate image compression. In NeurIPS, Vol. 35. 13146–13157.Google Scholar
[58] Li Jiahao, Li Bin, and Lu Yan. 2021. Deep contextual video compression. In NeurIPS, Vol. 34. 18114–18125.Google Scholar
[59] Li Jiahao, Li Bin, and Lu Yan. 2022. Hybrid spatial-temporal entropy modelling for neural video compression. In ACM Multimedia. 1503–1511.Google ScholarDigital Library
[60] Li Jiahao, Li Bin, and Lu Yan. 2023. Neural video compression with diverse contexts. In CVPR. 22616–22626.Google ScholarCross Ref
[61] Li Jiahao, Li Bin, Xu Jizheng, Xiong Ruiqin, and Gao Wen. 2018. Fully connected network-based intra prediction for image coding. IEEE Transactions on Image Processing 27, 7 (2018), 3236–3247.Google ScholarCross Ref
[62] Li Li, Li Houqiang, Liu Dong, Li Zhu, Yang Haitao, Lin Sixin, Chen Huanbang, and Wu Feng. 2018. An efficient four-parameter affine motion model for video coding. IEEE Transactions on Circuits and Systems for Video Technology 28, 8 (2018), 1934–1948.Google ScholarDigital Library
[63] Li Mu, Zhang Kai, Li Jinxing, Zuo Wangmeng, Timofte Radu, and Zhang David. 2023. Learning context-based nonlocal entropy modeling for image compression. IEEE Transactions on Neural Networks and Learning Systems 34, 3 (2023), 1132–1145.Google ScholarCross Ref
[64] Li Xin and Orchard Michael T.. 2001. Edge-directed prediction for lossless compression of natural images. IEEE Transactions on Image Processing 10, 6 (2001), 813–817.Google ScholarDigital Library
[65] Li Yue, Liu Dong, Li Houqiang, Li Li, Li Zhu, and Wu Feng. 2019. Learning a convolutional neural network for image compact-resolution. IEEE Transactions on Image Processing 28, 3 (2019), 1092–1107.Google ScholarCross Ref
[66] Li Yue, Yi Yan, Liu Dong, Li Li, Li Zhu, and Li Houqiang. 2021. Neural-network-based cross-channel intra prediction. ACM Trans. Multimedia Comput. Commun. Appl. 17, 3, Article 77 (Jul.2021), 23 pages.Google ScholarDigital Library
[67] Lin Chih-Hsuan, Chen Yi-Hsin, and Peng Wen-Hsiao. 2022. Content-adaptive motion rate adaption for learned video compression. In PCS. 163–167.Google ScholarCross Ref
[68] Lin Jianping, Liu Dong, Li Houqiang, and Wu Feng. 2020. M-LVC: Multiple frames prediction for learned video compression. In CVPR. 3546–3554.Google ScholarCross Ref
[69] Liu Dong, Chen Zhenzhong, Liu Shan, and Wu Feng. 2020. Deep learning-based technology in responses to the joint call for proposals on video compression with capability beyond HEVC. IEEE Transactions on Circuits and Systems for Video Technology 30, 5 (2020), 1267–1280.Google ScholarCross Ref
[70] Liu Dong, Li Yue, Lin Jianping, Li Houqiang, and Wu Feng. 2020. Deep learning-based video coding: A review and a case study. ACM Computing Surveys (CSUR) 53, 1 (2020), 1–35.Google ScholarDigital Library
[71] Liu Dong, Ma Haichuan, Xiong Zhiwei, and Wu Feng. 2018. CNN-based DCT-like transform for image compression. In MMM. Springer, 61–72.Google ScholarCross Ref
[72] Liu Dong, Sun Xiaoyan, and Wu Feng. 2008. Manipulating image patches for compression. In ICME. 197–200.Google ScholarCross Ref
[73] Liu H., Chen Y., Chen J., Zhang L., and Karczewicz M.. 2015. Local Illumination Compensation. Technical Report VCEG-AZ06. VCEG.Google Scholar
[74] Liu Haojie, Lu Ming, Ma Zhan, Wang Fan, Xie Zhihuang, Cao Xun, and Wang Yao. 2021. Neural video coding using multiscale motion compensation and spatiotemporal context model. IEEE Transactions on Circuits and Systems for Video Technology 31, 8 (2021), 3182–3196.Google ScholarCross Ref
[75] Liu Jiaying, Liu Dong, Yang Wenhan, Xia Sifeng, Zhang Xiaoshuai, and Dai Yuanying. 2020. A comprehensive benchmark for single image compression artifact reduction. IEEE Transactions on Image Processing 29 (2020), 7845–7860.Google ScholarCross Ref
[76] Liu Jinming, Sun Heming, and Katto Jiro. 2023. Learned image compression with mixed transformer-CNN architectures. In CVPR. 14388–14397.Google ScholarCross Ref
[77] Liu Jerry, Wang Shenlong, Ma Wei-Chiu, Shah Meet, Hu Rui, Dhawan Pranaab, and Urtasun Raquel. 2020. Conditional entropy coding for efficient video compression. In ECCV. Springer, 453–468.Google ScholarDigital Library
[78] Liu Kang, Liu Dong, Li Li, and Li Houqiang. 2021. Context-adaptive inverse quantization for inter-frame coding. IEEE Open Journal of Circuits and Systems 2 (2021), 660–674.Google ScholarCross Ref
[79] Liu Lurng-Kuo and Feig Ephraim. 1996. A block-based gradient descent search algorithm for block motion estimation in video coding. IEEE Transactions on Circuits and Systems for Video Technology 6, 4 (1996), 419–422.Google ScholarDigital Library
[80] Liu Zheng, Li Tianyi, Chen Ying, Wei Kaijin, Xu Mai, and Qi Honggang. 2023. Deep multi-task learning based fast intra-mode decision for versatile video coding. IEEE Transactions on Circuits and Systems for Video Technology 33, 10 (2023), 6101–6116.Google ScholarDigital Library
[81] Liu Zhenyu, Yu Xianyu, Gao Yuan, Chen Shaolin, Ji Xiangyang, and Wang Dongsheng. 2016. CU partition mode decision for HEVC hardwired intra encoder using convolution neural network. IEEE Transactions on Image Processing 25, 11 (2016), 5088–5103.Google ScholarDigital Library
[82] Lu Guo, Cai Chunlei, Zhang Xiaoyun, Chen Li, Ouyang Wanli, Xu Dong, and Gao Zhiyong. 2020. Content adaptive and error propagation aware deep video compression. In ECCV. 456–472.Google ScholarDigital Library
[83] Lu Guo, Ouyang Wanli, Xu Dong, Zhang Xiaoyun, Cai Chunlei, and Gao Zhiyong. 2019. DVC: An end-to-end deep video compression framework. In CVPR. 11006–11015.Google ScholarCross Ref
[84] Ma Changyue, Liu Dong, Peng Xiulian, Li Li, and Wu Feng. 2019. Convolutional neural network-based arithmetic coding for HEVC intra-predicted residues. IEEE Transactions on Circuits and Systems for Video Technology 30, 7 (2019), 1901–1916.Google Scholar
[85] Ma Haichuan, Liu Dong, Dong Cunhui, Li Li, and Wu Feng. 2021. End-to-end image compression with probabilistic decoding. arXiv preprint arXiv:2109.14837 (2021).Google Scholar
[86] Ma Haichuan, Liu Dong, and Wu Feng. 2020. Improving compression artifact reduction via end-to-end learning of side information. In VCIP. 403–406.Google ScholarCross Ref
[87] Ma Haichuan, Liu Dong, Xiong Ruiqin, and Wu Feng. 2019. iWave: CNN-based wavelet-like transform for image compression. IEEE Transactions on Multimedia 22, 7 (2019), 1667–1679.Google ScholarCross Ref
[88] Ma Haichuan, Liu Dong, Yan Ning, Li Houqiang, and Wu Feng. 2022. End-to-end optimized versatile image compression with wavelet-like transform. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 3 (2022), 1247–1263.Google ScholarCross Ref
[89] Meardi Guido, Ferrara Simone, Ciccarelli Lorenzo, Cobianchi Guendalina, Poularakis Stergios, Maurer Florian, Battista Stefano, and Byagowi Ahmad. 2020. MPEG-5 Part 2: Low complexity enhancement video coding (LCEVC): Overview and performance evaluation. In Applications of Digital Image Processing XLIII, Vol. 11510. International Society for Optics and Photonics, 115101C.Google Scholar
[90] Mentzer Fabian, Agustsson Eirikur, Tschannen Michael, Timofte Radu, and Gool Luc Van. 2018. Conditional probability models for deep image compression. In CVPR. 4394–4402.Google ScholarCross Ref
[91] Mentzer Fabian, Toderici George D., Minnen David, Caelles Sergi, Hwang Sung Jin, Lucic Mario, and Agustsson Eirikur. 2022. VCT: A video compression transformer. In NeurIPS, Vol. 35. 13091–13103.Google Scholar
[92] Minnen David, Ballé Johannes, and Toderici George. 2018. Joint autoregressive and hierarchical priors for learned image compression. In NIPS, Vol. 31. 10794–10803.Google Scholar
[93] Minnen David and Singh Saurabh. 2020. Channel-wise autoregressive entropy models for learned image compression. In ICIP. IEEE, 3339–3343.Google ScholarCross Ref
[94] Nocedal Jorge and Wright Stephen. 2006. Numerical Optimization. Springer Science & Business Media.Google Scholar
[95] Ohm Jens-Rainer, Sullivan Gary J., Schwarz Heiko, Tan Thiow Keng, and Wiegand Thomas. 2012. Comparison of the coding efficiency of video coding standards-including high efficiency video coding (HEVC). IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1669–1684.Google ScholarDigital Library
[96] Ortega Antonio and Ramchandran Kannan. 1998. Rate-distortion methods for image and video compression. IEEE Signal Processing Magazine 15, 6 (1998), 23–50.Google ScholarCross Ref
[97] Pan Guanbo, Lu Guo, Hu Zhihao, and Xu Dong. 2022. Content adaptive latents and decoder for neural image compression. In ECCV. Springer, 556–573.Google ScholarDigital Library
[98] Peng Wen-Hsiao, Walls Frederick G., Cohen Robert A., Xu Jizheng, Ostermann Jörn, MacInnis Alexander, and Lin Tao. 2016. Overview of screen content video coding: Technologies, standards, and beyond. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 6, 4 (2016), 393–408.Google ScholarCross Ref
[99] Pfaff J., Helle P., Maniry D., Kaltenstadler S., Samek W., Schwarz H., Marpe D., and Wiegand T.. 2018. Neural network based intra prediction for video coding. In Applications of Digital Image Processing XLI, Vol. 10752. International Society for Optics and Photonics, 1075213.Google ScholarCross Ref
[100] Po Lai-Man, Ng Ka-Ho, Cheung Kwok-Wai, Wong Ka-Man, Uddin Yusuf Md. Salah, and Ting Chi-Wang. 2009. Novel directional gradient descent searches for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 19, 8 (2009), 1189–1195.Google ScholarDigital Library
[101] Qian Yichen, Lin Ming, Sun Xiuyu, Tan Zhiyu, and Jin Rong. 2022. Entroformer: A transformer-based entropy model for learned image compression. arXiv preprint arXiv:2202.05492 (2022).Google Scholar
[102] Rippel Oren, Anderson Alexander G., Tatwawadi Kedar, Nair Sanjay, Lytle Craig, and Bourdev Lubomir. 2021. ELF-VC: Efficient learned flexible-rate video coding. In ICCV. 14479–14488.Google ScholarCross Ref
[103] Rippel Oren, Nair Sanjay, Lew Carissa, Branson Steve, Anderson Alexander G., and Bourdev Lubomir. 2019. Learned video compression. In ICCV. 3454–3463.Google ScholarCross Ref
[104] Shannon C. E.. 1948. A mathematical theory of communication. Bell Systems Technical Journal 27, 4 (1948), 623–656.Google ScholarCross Ref
[105] Shannon C. E.. 1959. Coding theorems for a discrete source with a fidelity criteria. International Convention Record 7 (1959), 325–350.Google Scholar
[106] Sheng Xihua, Li Jiahao, Li Bin, Li Li, Liu Dong, and Lu Yan. 2023. Temporal context mining for learned video compression. IEEE Transactions on Multimedia 25 (2023), 7311–7322.Google ScholarDigital Library
[107] Shi Yibo, Ge Yunying, Wang Jing, and Mao Jue. 2022. AlphaVC: High-performance and efficient learned video compression. In ECCV. Springer, 616–631.Google ScholarDigital Library
[108] Sikora Thomas. 2005. Trends and perspectives in image and video coding. Proc. IEEE 93, 1 (2005), 6–17.Google ScholarCross Ref
[109] Song Li, Tang Xun, Zhang Wei, Yang Xiaokang, and Xia Pingjian. 2013. The SJTU 4K video sequence dataset. In QoMEX. 34–35.Google ScholarCross Ref
[110] Song Myungseo, Choi Jinyoung, and Han Bohyung. 2021. Variable-rate deep image compression through spatially-adaptive feature transform. In ICCV. 2360–2369.Google ScholarCross Ref
[111] Storch Iago, Agostini Luciano, Zatt Bruno, Bampi Sergio, and Palomino Daniel. 2022. FastInter360: A fast inter mode decision for HEVC 360 video coding. IEEE Transactions on Circuits and Systems for Video Technology 32, 5 (2022), 3235–3249.Google ScholarDigital Library
[112] Sullivan Gary J., Ohm Jens, Han Woo-Jin, and Wiegand Thomas. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668.Google ScholarDigital Library
[113] Sullivan Gary J. and Wiegand Thomas. 1998. Rate-distortion optimization for video compression. IEEE Signal Processing Magazine 15, 6 (1998), 74–90.Google ScholarCross Ref
[114] Sun Heming, Yu Lu, and Katto Jiro. 2022. Improving latent quantization of learned image compression with gradient scaling. In VCIP. 1–5.Google ScholarCross Ref
[115] Tang Zhisen, Wang Hanli, Yi Xiaokai, Zhang Yun, Kwong Sam, and Kuo C.-C. Jay. 2022. Joint graph attention and asymmetric convolutional neural network for deep image compression. IEEE Transactions on Circuits and Systems for Video Technology 33, 1 (2022), 421–433.Google ScholarCross Ref
[116] Theis Lucas and Agustsson Eirikur. 2021. On the advantages of stochastic encoders. arXiv preprint arXiv:2102.09270 (2021).Google Scholar
[117] Theis Lucas, Shi Wenzhe, Cunningham Andrew, and Huszár Ferenc. 2017. Lossy image compression with compressive autoencoders. arXiv preprint arXiv:1703.00395 (2017).Google Scholar
[118] Toderici George, O’Malley Sean M., Hwang Sung Jin, Vincent Damien, Minnen David, Baluja Shumeet, Covell Michele, and Sukthankar Rahul. 2015. Variable rate image compression with recurrent neural networks. arXiv preprint arXiv:1511.06085 (2015).Google Scholar
[119] Tsai Chia-Yang, Chen Ching-Yeh, Yamakage Tomoo, Chong In Suk, Huang Yu-Wen, Fu Chih-Ming, Itoh Takayuki, Watanabe Takashi, Chujoh Takeshi, Karczewicz Marta, and Lei Shaw-Min. 2013. Adaptive loop filtering for video coding. IEEE Journal of Selected Topics in Signal Processing 7, 6 (2013), 934–945.Google ScholarCross Ref
[120] Rozendaal Ties van, Brehmer Johann, Zhang Yunfan, Pourreza Reza, and Cohen Taco S.. 2021. Instance-Adaptive Video Compression: Improving Neural Codecs by Training on the Test Set. arXiv preprint arXiv:2111.10302 (2021).Google Scholar
[121] Vatis Yuri and Ostermann Joern. 2008. Adaptive interpolation filter for H. 264/AVC. IEEE Transactions on Circuits and Systems for Video Technology 19, 2 (2008), 179–192.Google ScholarDigital Library
[122] Wallace Gregory K.. 1992. The JPEG still picture compression standard. IEEE Transactions on Consumer Electronics 38, 1 (1992), xviii–xxxiv.Google ScholarDigital Library
[123] Wang Dezhao, Yang Wenhan, Hu Yueyu, and Liu Jiaying. 2022. Neural data-dependent transform for learned image compression. In CVPR. 17379–17388.Google ScholarCross Ref
[124] Wang Xiao, Ding Ding, Jiang Wei, Wang Wei, Xu Xiaozhong, Liu Shan, Kulis Brian, and Chin Peter. 2022. Substitutional neural image compression. In PCS. 97–101.Google ScholarCross Ref
[125] Wang Yefei, Liu Dong, Ma Siwei, Wu Feng, and Gao Wen. 2020. Ensemble learning-based rate-distortion optimization for end-to-end image compression. IEEE Transactions on Circuits and Systems for Video Technology 31, 3 (2020), 1193–1207.Google ScholarCross Ref
[126] Wang Yao, Ostermann Jörn, and Zhang Ya-Qin. 2002. Video Processing and Communications. Vol. 1. Prentice Hall Upper Saddle River, NJ.Google Scholar
[127] Wedi Thomas. 2006. Adaptive interpolation filters and high-resolution displacements for video coding. IEEE Transactions on Circuits and Systems for Video Technology 16, 4 (2006), 484–491.Google ScholarDigital Library
[128] Wiegand Thomas, Schwarz Heiko, Joch Anthony, Kossentini Faouzi, and Sullivan Gary J.. 2003. Rate-constrained coder control and comparison of video coding standards. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 688–703.Google ScholarDigital Library
[129] Wiegand Thomas, Sullivan Gary J., Bjontegaard Gisle, and Luthra Ajay. 2003. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology 13, 7 (2003), 560–576.Google ScholarDigital Library
[130] Wu Xiaolin, Barthel E. U., and Zhang Wenhan. 1998. Piecewise 2D autoregression for predictive image coding. In ICIP. IEEE, 901–904.Google Scholar
[131] Xie Yueqi, Cheng Ka Leong, and Chen Qifeng. 2021. Enhanced invertible encoding for learned image compression. In ACM Multimedia. 162–170.Google ScholarDigital Library
[132] Xu Mai, Li Tianyi, Wang Zulin, Deng Xin, Yang Ren, and Guan Zhenyu. 2018. Reducing complexity of HEVC: A deep learning approach. IEEE Transactions on Image Processing 27, 10 (2018), 5044–5059.Google ScholarCross Ref
[133] Xu Tongda, Gao Han, Gao Chenjian, Wang Yuanyuan, He Dailan, Pi Jinyong, Luo Jixiang, Zhu Ziyu, Ye Mao, Qin Hongwei, Wang Yan, Liu Jingjing, and Zhang Ya-Qin. 2023. Bit allocation using optimization. In ICML. 38377–38399.Google Scholar
[134] Yan Ning, Liu Dong, Li Houqiang, Li Bin, Li Li, and Wu Feng. 2019. Invertibility-driven interpolation filter for video coding. IEEE Transactions on Image Processing 28, 10 (2019), 4912–4925.Google ScholarCross Ref
[135] Yang Kun, Liu Dong, and Wu Feng. 2020. Deep learning-based nonlinear transform for HEVC intra coding. In VCIP. 387–390.Google ScholarCross Ref
[136] Yang Runyu, Liu Dong, Ma Siwei, Wu Feng, and Gao Wen. 2021. Knowledge distillation from end-to-end image compression to VVC intra coding for perceptual quality enhancement. In ICIP. 3438–3442.Google ScholarCross Ref
[137] Yang Ren, Mentzer Fabian, Gool Luc Van, and Timofte Radu. 2020. Learning for video compression with hierarchical quality and recurrent enhancement. In CVPR. 6628–6637.Google ScholarCross Ref
[138] Yang Ren, Mentzer Fabian, Gool Luc Van, and Timofte Radu. 2020. Learning for video compression with recurrent auto-encoder and recurrent probability model. IEEE Journal of Selected Topics in Signal Processing 15, 2 (2020), 388–401.Google ScholarCross Ref
[139] Yang Yibo, Bamler Robert, and Mandt Stephan. 2020. Improving inference for neural image compression. In NeurIPS, Vol. 33. 573–584.Google Scholar
[140] Ye Hua, Deng Guang, and Devlin John C.. 1999. Least squares approach for lossless image coding. In International Symposium on Signal Processing and its Applications (ISSPA), Vol. 1. IEEE, 63–66.Google ScholarCross Ref
[141] Yuan Hui, Chang Yilin, Lu Zhaoyang, and Ma Yanzhuo. 2010. Model based motion vector predictor for zoom motion. IEEE Signal Processing Letters 17, 9 (2010), 787–790.Google ScholarCross Ref
[142] Yuan Hui, Liu Ju, Sun Jiande, Liu Hechao, and Li Yujun. 2012. Affine model based motion compensation prediction for zoom. IEEE Transactions on Multimedia 14, 4 (2012), 1370–1375.Google ScholarDigital Library
[143] Yılmaz M. Akın and Tekalp A. Murat. 2022. End-to-end rate-distortion optimized learned hierarchical bi-directional video compression. IEEE Transactions on Image Processing 31 (2022), 974–983.Google ScholarDigital Library
[144] Zhang Honglei, Cricri Francesco, Tavakoli Hamed Rezazadegan, Santamaria Maria, Lam Yat-Hong, and Hannuksela Miska M.. 2021. Learn to overfit better: Finding the important parameters for learned image compression. In VCIP. IEEE, 1–5.Google ScholarCross Ref
[145] Zhang Jiaqi, Jia Chuanmin, Lei Meng, Wang Shanshe, Ma Siwei, and Gao Wen. 2019. Recent development of AVS video coding standard: AVS3. In PCS. IEEE, 1–5.Google ScholarCross Ref
[146] Zhang Kai, Chen Jianle, Zhang Li, Li Xiang, and Karczewicz Marta. 2018. Enhanced cross-component linear model for chroma intra-prediction in video coding. IEEE Transactions on Image Processing 27, 8 (2018), 3983–3997.Google ScholarCross Ref
[147] Zhang Xi and Wu Xiaolin. 2023. LVQAC: Lattice vector quantization coupled with spatially adaptive companding for efficient learned image compression. In CVPR. 10239–10248.Google ScholarCross Ref
[148] Zhang Ziqiu, Ma Changyue, Liu Dong, Li Li, and Wu Feng. 2021. Improving VVC intra coding via probability estimation and fusion of multiple prediction modes. In ICIG. Springer, 654–664.Google ScholarDigital Library
[149] Zhao Jing, Li Bin, Li Jiahao, Xiong Ruiqin, and Lu Yan. 2021. A universal encoder rate distortion optimization framework for learned compression. In CVPR. 1880–1884.Google ScholarCross Ref
[150] Zhao Zhenghui, Wang Shiqi, Wang Shanshe, Zhang Xinfeng, Ma Siwei, and Yang Jiansheng. 2019. Enhanced bi-prediction with convolutional neural network for high efficiency video coding. IEEE Transactions on Circuits and Systems for Video Technology 29, 11 (2019), 3291–3301.Google ScholarDigital Library
[151] Zhong Zhisheng, Akutsu Hiroaki, and Aizawa Kiyoharu. 2020. Channel-level variable quantization network for deep image compression. In IJCAI. 467–473.Google ScholarCross Ref
[152] Zhu Xiaosu, Song Jingkuan, Gao Lianli, Zheng Feng, and Shen Heng Tao. 2022. Unified multivariate Gaussian mixture for efficient neural image compression. In CVPR. 17612–17621.Google ScholarCross Ref
[153] Zhu Yinhao, Yang Yang, and Cohen Taco. 2022. Transformer-based transform coding. In ICLR. https://openreview.net/forum?id=IDwN6xjHnK8Google Scholar
[154] Zou Nannan, Zhang Honglei, Cricri Francesco, Tavakoli Hamed R., Lainema Jani, Hannuksela Miska, Aksu Emre, and Rahtu Esa. 2020. L2C – learning to learn to compress. In MMSP. 1–6.Google Scholar
[155] Zou Renjie, Song Chunfeng, and Zhang Zhaoxiang. 2022. The devil is in the details: Window-based attention for image compression. In CVPR. 17492–17501.Google ScholarCross Ref

Index Terms

Towards Hybrid-Optimization Video Coding
1. Computing methodologies
  1. Computer graphics
    1. Image compression

Recommendations

SSIM-based error-resilient rate-distortion optimization of H.264/AVC video coding for wireless streaming

The SSIM-based rate-distortion optimization (RDO) has been verified to be an effective tool for H.264/AVC to promote the perceptual video coding performance. However, the current SSIM-based RDO is not efficient for improving the perceptual quality of ...
Read More
Rate-distortion optimized rate-allocation for motion-compensated predictive video codecs using PixelRank

Inter-frame dependencies are usually ignored in video encoder coding parameter selection. This gives a non-optimal solution and degrades the compression performance. A mathematical model to estimate the importance of each pixel on the reconstructed ...
Read More
Video coding optimization in AVS2
Abstract
Chinese second generation of the Audio Video Coding Standard, known as the AVS2, competing with HEVC/H.265 and AV1, has become a well-known video compression standard. Many unique tools have been developed and incorporated in AVS2. ...
Highlights
- A frame level QP and λ allocation named reference structure determined parameter (RSDP) algorithm is proposed to satisfy GoP length 4, 8, and 16 ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Computing Surveys Volume 56, Issue 9
September 2024
980 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/3613649
Editors:
David Atienza
Swiss Federal Institute of Technology Lausanne (EPFL), Switzerland
,
Michela Milano
University of Bologna, Italy
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 April 2024
- Online AM: 11 March 2024
- Accepted: 7 March 2024
- Revised: 19 February 2024
- Received: 29 August 2022
Published in csur Volume 56, Issue 9

Check for updates
Author Tags
Hybrid optimization
numerical optimization
offline optimization
online optimization
rate-distortion optimization
search-based optimization
video coding
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 185
  Total Downloads
- Downloads (Last 12 months)185
- Downloads (Last 6 weeks)85
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

View Full Text

Towards Hybrid-Optimization Video Coding

ACM Computing Surveys

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

SSIM-based error-resilient rate-distortion optimization of H.264/AVC video coding for wireless streaming

Rate-distortion optimized rate-allocation for motion-compensated predictive video codecs using PixelRank

Video coding optimization in AVS2

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Full Text

Caption

Towards Hybrid-Optimization Video Coding

ACM Computing Surveys

Abstract

REFERENCES

Cited By

Index Terms

Recommendations

SSIM-based error-resilient rate-distortion optimization of H.264/AVC video coding for wireless streaming

Rate-distortion optimized rate-allocation for motion-compensated predictive video codecs using PixelRank

Video coding optimization in AVS2

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Full Text

Share this Publication link

Share on Social Media