ShanghaiTech University Knowledge Management System
HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models | |
2023-06-17 | |
会议录名称 | 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
![]() |
ISSN | 1063-6919 |
卷号 | 2023-June |
页码 | 23507-23517 |
发表状态 | 已发表 |
DOI | 10.1109/CVPR52729.2023.02251 |
摘要 | Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human- object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP. © 2023 IEEE. |
会议录编者/会议主办者 | Amazon Science ; Ant Research ; Cruise ; et al. ; Google ; Lambda |
关键词 | Recognition: Categorization detection retrieval |
会议名称 | 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023 |
会议地点 | Vancouver, BC, Canada |
会议日期 | 17-24 June 2023 |
URL | 查看原文 |
收录类别 | EI |
语种 | 英语 |
出版者 | IEEE Computer Society |
EI入藏号 | 20234114867253 |
原始文献类型 | Conference article (CA) |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/333407 |
专题 | 信息科学与技术学院_硕士生 信息科学与技术学院_PI研究组_何旭明组 信息科学与技术学院_博士生 |
作者单位 | 1.ShanghaiTech University, Shanghai, China 2.ByteDance Inc. |
第一作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Shan Ning,Longtian Qiu,Yongfei Liu,et al. HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models[C]//Amazon Science, Ant Research, Cruise, et al., Google, Lambda:IEEE Computer Society,2023:23507-23517. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。