Deep Learning Technique for Human Parsing: A Survey and Outlook,International Journal of Computer Vision

当前位置： X-MOL 学术 › Int. J. Comput. Vis. › 论文详情

Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)

Deep Learning Technique for Human Parsing: A Survey and Outlook
International Journal of Computer Vision ( IF 19.5 ) Pub Date : 2024-03-09 , DOI: 10.1007/s11263-024-02031-9
Lu Yang , Wenhe Jia , Shan Li , Qing Song

Human parsing aims to partition humans in image or video into multiple pixel-level semantic parts. In the last decade, it has gained significantly increased interest in the computer vision community and has been utilized in a broad range of practical applications, from security monitoring, to social media, to visual special effects, just to name a few. Although deep learning-based human parsing solutions have made remarkable achievements, many important concepts, existing challenges, and potential research directions are still confusing. In this survey, we comprehensively review three core sub-tasks: single human parsing, multiple human parsing, and video human parsing, by introducing their respective task settings, background concepts, relevant problems and applications, representative literature, and datasets. We also present quantitative performance comparisons of the reviewed methods on benchmark datasets. Additionally, to promote sustainable development of the community, we put forward a transformer-based human parsing framework, providing a high-performance baseline for follow-up research through universal, concise, and extensible solutions. Finally, we point out a set of under-investigated open issues in this field and suggest new directions for future study. We also provide a regularly updated project page, to continuously track recent developments in this fast-advancing field: https://github.com/soeaver/awesome-human-parsing.

中文翻译：

用于人体解析的深度学习技术：调查与展望

人体解析旨在将图像或视频中的人体划分为多个像素级语义部分。在过去的十年中，它在计算机视觉社区中引起了显着增长的兴趣，并已被广泛应用于从安全监控到社交媒体，再到视觉特效等广泛的实际应用中。尽管基于深度学习的人体解析解决方案取得了令人瞩目的成就，但许多重要概念、现有挑战和潜在研究方向仍然令人困惑。在本次调查中，我们通过介绍各自的任务设置、背景概念、相关问题和应用、代表性文献和数据集，全面回顾了单人解析、多人解析和视频人解析三个核心子任务。我们还对基准数据集上所审查的方法进行了定量性能比较。此外，为了促进社区的可持续发展，我们提出了基于Transformer的人体解析框架，通过通用、简洁、可扩展的解决方案为后续研究提供高性能基线。最后，我们指出了该领域一系列尚未得到充分研究的开放问题，并为未来的研究提出了新的方向。我们还提供定期更新的项目页面，以持续跟踪这个快速发展领域的最新发展：https://github.com/soeaver/awesome- human-parsing。

更新日期：2024-03-09

点击分享查看原文

点击收藏

公开下载

阅读更多本刊最新论文本刊介绍/投稿指南

全部期刊列表>>