HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models
2023-06-17
会议录名称2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)
ISSN1063-6919
卷号2023-June
页码23507-23517
发表状态已发表
DOI10.1109/CVPR52729.2023.02251
摘要Human-Object Interaction (HOI) detection aims to localize human-object pairs and recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has shown great potential in providing interaction prior for HOI detectors via knowledge distillation. However, such approaches often rely on large-scale training data and suffer from inferior performance under few/zero-shot scenarios. In this paper, we propose a novel HOI detection framework that efficiently extracts prior knowledge from CLIP and achieves better generalization. In detail, we first introduce a novel interaction decoder to extract informative regions in the visual feature map of CLIP via a cross-attention mechanism, which is then fused with the detection backbone by a knowledge integration block for more accurate human- object pair detection. In addition, prior knowledge in CLIP text encoder is leveraged to generate a classifier by embedding HOI descriptions. To distinguish fine-grained interactions, we build a verb classifier from training data via visual semantic arithmetic and a lightweight verb representation adapter. Furthermore, we propose a training-free enhancement to exploit global HOI predictions from CLIP. Extensive experiments demonstrate that our method outperforms the state of the art by a large margin on various settings, e.g. +4.04 mAP on HICO-Det. The source code is available in https://github.com/Artanic30/HOICLIP. © 2023 IEEE.
会议录编者/会议主办者Amazon Science ; Ant Research ; Cruise ; et al. ; Google ; Lambda
关键词Recognition: Categorization detection retrieval
会议名称2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023
会议地点Vancouver, BC, Canada
会议日期17-24 June 2023
URL查看原文
收录类别EI
语种英语
出版者IEEE Computer Society
EI入藏号20234114867253
原始文献类型Conference article (CA)
来源库IEEE
引用统计
正在获取...
文献类型会议论文
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/333407
专题信息科学与技术学院_硕士生
信息科学与技术学院_PI研究组_何旭明组
信息科学与技术学院_博士生
作者单位
1.ShanghaiTech University, Shanghai, China
2.ByteDance Inc.
第一作者单位上海科技大学
第一作者的第一单位上海科技大学
推荐引用方式
GB/T 7714
Shan Ning,Longtian Qiu,Yongfei Liu,et al. HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models[C]//Amazon Science, Ant Research, Cruise, et al., Google, Lambda:IEEE Computer Society,2023:23507-23517.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Shan Ning]的文章
[Longtian Qiu]的文章
[Yongfei Liu]的文章
百度学术
百度学术中相似的文章
[Shan Ning]的文章
[Longtian Qiu]的文章
[Yongfei Liu]的文章
必应学术
必应学术中相似的文章
[Shan Ning]的文章
[Longtian Qiu]的文章
[Yongfei Liu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。