当前位置: X-MOL 学术ACM Trans. Graph. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
Evaluating gesture generation in a large-scale open challenge: The GENEA Challenge 2022
ACM Transactions on Graphics  ( IF 6.2 ) Pub Date : 2024-04-27 , DOI: 10.1145/3656374
Taras Kucherenko 1 , Pieter Wolfert 2, 3 , Youngwoo Yoon 4 , Carla Viegas 5, 6 , Teodor Nikolov 7, 8 , Mihail Tsakov 7 , Gustav Eje Henter 8, 9
Affiliation  

This paper reports on the second GENEA Challenge to benchmark data-driven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research papers, differences in results are here only due to differences between methods, enabling direct comparison between systems. The dataset was based on 18 hours of full-body motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field.

The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Fréchet gesture distance (FGD), which achieves a Kendall’s tau rank correlation of around \(-0.5\). Based on the challenge results we formulate numerous recommendations for system building and evaluation.



中文翻译:

在大规模公开挑战中评估手势生成:GENEA 挑战 2022

本文报告了第二届 GEEA 挑战赛,以对数据驱动的自动协同语音手势生成进行基准测试。参与团队使用相同的语音和运动数据集来构建手势生成系统。所有这些系统生成的运动都使用标准化可视化管道呈现为视频,并在几项大型众包用户研究中进行评估。与比较不同研究论文不同,这里的结果差异仅是由于方法之间的差异,可以在系统之间进行直接比较。该数据集基于对参与二元对话的不同人 18 小时的全身动作捕捉(包括手指)。十支队伍参加了全身手势和上半身手势两个级别的挑战。对于每一层,我们评估了手势运动的人类相似性及其对特定语音信号的适当性。我们的评估将人类相似性与手势适当性脱钩,这一直是该领域的一个难题。

评估结果显示,一些合成手势条件被评为比 3D 人体动作捕捉更接近人类。据我们所知,这之前尚未得到证实。另一方面,我们发现所有合成动作都比原始动作捕捉录音更不适合语音。我们还发现,在这次大型评估中,传统的客观指标与主观的人类相似度评级并没有很好的相关性。唯一的例外是 Fréchet 手势距离 (FGD),它实现了大约 \(-0.5\) 的 Kendall tau 秩相关性。根据挑战结果,我们为系统构建和评估制定了大量建议。

更新日期:2024-04-27
down
wechat
bug