ShanghaiTech University Knowledge Management System
EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment | |
2023-09-03 | |
会议录名称 | 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)
![]() |
ISSN | 1550-5499 |
发表状态 | 已发表 |
DOI | 10.1109/ICCV51070.2023.01441 |
摘要 | Vision-language models such as CLIP have boosted the performance of open-vocabulary object detection, where the detector is trained on base categories but required to detect novel categories. Existing methods leverage CLIP's strong zero-shot recognition ability to align object-level embeddings with textual embeddings of categories. However, we observe that using CLIP for object-level alignment results in overfitting to base categories, i.e., novel categories most similar to base categories have particularly poor performance as they are recognized as similar base categories. In this paper, we first identify that the loss of critical fine-grained local image semantics hinders existing methods from attaining strong base-to-novel generalization. Then, we propose Early Dense Alignment (EDA) to bridge the gap between generalizable local semantics and object-level prediction. In EDA, we use object-level supervision to learn the dense-level rather than object-level alignment to maintain the local fine-grained semantics. Extensive experiments demonstrate our superior performance to competing approaches under the same strict setting and without using external training resources, i.e., improving the +8.4% novel box AP50 on COCO and +3.9% rare mask AP on LVIS. |
会议地点 | Paris, France |
会议日期 | 1-6 Oct. 2023 |
URL | 查看原文 |
资助项目 | National Natural Science Foundation of China[ |
WOS类目 | Computer Science, Software Engineering |
WOS记录号 | PPRN:84763295 |
来源库 | IEEE |
引用统计 | 正在获取...
|
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348024 |
专题 | 信息科学与技术学院 信息科学与技术学院_硕士生 信息科学与技术学院_PI研究组_杨思蓓组 |
通讯作者 | Yang, Sibei |
作者单位 | ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China |
第一作者单位 | 信息科学与技术学院 |
通讯作者单位 | 信息科学与技术学院 |
第一作者的第一单位 | 信息科学与技术学院 |
推荐引用方式 GB/T 7714 | Shi, Cheng,Yang, Sibei. EdaDet: Open-Vocabulary Object Detection Using Early Dense Alignment[C],2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Shi, Cheng]的文章 |
[Yang, Sibei]的文章 |
百度学术 |
百度学术中相似的文章 |
[Shi, Cheng]的文章 |
[Yang, Sibei]的文章 |
必应学术 |
必应学术中相似的文章 |
[Shi, Cheng]的文章 |
[Yang, Sibei]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。