| |||||||
ShanghaiTech University Knowledge Management System
Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding | |
2022 | |
会议录名称 | LECTURE NOTES IN COMPUTER SCIENCE (INCLUDING SUBSERIES LECTURE NOTES IN ARTIFICIAL INTELLIGENCE AND LECTURE NOTES IN BIOINFORMATICS)
![]() |
ISSN | 0302-9743 |
卷号 | 13696 LNCS |
页码 | 201-218 |
发表状态 | 已发表 |
DOI | 10.1007/978-3-031-20059-5_12 |
摘要 | Embodied Reference Understanding studies the reference understanding in an embodied fashion, where a receiver requires to locate a target object referred to by both language and gesture of the sender in a shared physical environment. Its main challenge lies in how to make the receiver with the egocentric view access spatial and visual information relative to the sender to judge how objects are oriented around and seen from the sender, i.e., spatial and visual perspective-taking. In this paper, we propose a REasoning from your Perspective (REP) method to tackle the challenge by modeling relations between the receiver and the sender as well as the sender and the objects via the proposed novel view rotation and relation reasoning. Specifically, view rotation first rotates the receiver to the position of the sender by constructing an embodied 3D coordinate system with the position of the sender as the origin. Then, it changes the orientation of the receiver to the orientation of the sender by encoding the body orientation and gesture of the sender. Relation reasoning models both the nonverbal and verbal relations between the sender and the objects by multi-modal cooperative reasoning in gesture, language, visual content, and spatial position. Experiment results demonstrate the effectiveness of REP, which consistently surpasses all existing state-of-the-art algorithms by a large margin, i.e., + 5.22% absolute accuracy in terms of Prec@0.5 on YouRefIt. Code is available (https://github.com/ChengShiest/REP-ERU ). © 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG. |
关键词 | Visual languages Embodied reference understanding Perspective taking Physical environments Referring expression comprehension Referring expressions Relation reasoning Spatial informations Target object View rotation Visual information |
会议名称 | 17th European Conference on Computer Vision, ECCV 2022 |
出版地 | GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND |
会议地点 | Tel Aviv, Israel |
会议日期 | October 23, 2022 - October 27, 2022 |
URL | 查看原文 |
收录类别 | EI ; CPCI-S |
语种 | 英语 |
资助项目 | Shanghai Pujiang Program[21PJ1410900] |
WOS研究方向 | Computer Science ; Imaging Science & Photographic Technology |
WOS类目 | Computer Science, Artificial Intelligence ; Imaging Science & Photographic Technology |
WOS记录号 | WOS:000903751800012 |
出版者 | Springer Science and Business Media Deutschland GmbH |
EI入藏号 | 20224813182941 |
EI主题词 | Rotation |
EISSN | 1611-3349 |
EI分类号 | 723.1.1 Computer Programming Languages ; 931.1 Mechanics |
原始文献类型 | Conference article (CA) |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/272835 |
专题 | 信息科学与技术学院_硕士生 |
通讯作者 | Yang, Sibei |
作者单位 | 1.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China 2.Shanghai Engn Res Ctr Intelligent Vis & Imaging, Shanghai, Peoples R China |
第一作者单位 | 信息科学与技术学院 |
通讯作者单位 | 信息科学与技术学院 |
第一作者的第一单位 | 信息科学与技术学院 |
推荐引用方式 GB/T 7714 | Shi, Cheng,Yang, Sibei. Spatial and Visual Perspective-Taking via View Rotation and Relation Reasoning for Embodied Reference Understanding[C]. GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND:Springer Science and Business Media Deutschland GmbH,2022:201-218. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Shi, Cheng]的文章 |
[Yang, Sibei]的文章 |
百度学术 |
百度学术中相似的文章 |
[Shi, Cheng]的文章 |
[Yang, Sibei]的文章 |
必应学术 |
必应学术中相似的文章 |
[Shi, Cheng]的文章 |
[Yang, Sibei]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。