| |||||||
ShanghaiTech University Knowledge Management System
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space | |
2024-12-19 | |
状态 | 已发表 |
摘要 | Open-set object detection (OSOD) is highly desirable for robotic manipulation in unstructured environments. However, existing OSOD methods often fail to meet the requirements of robotic applications due to their high computational burden and complex deployment. To address this issue, this paper proposes a light-weight framework called Decoupled OSOD (DOSOD), which is a practical and highly efficient solution to support real-time OSOD tasks in robotic systems. Specifically, DOSOD builds upon the YOLO-World pipeline by integrating a vision-language model (VLM) with a detector. A Multilayer Perceptron (MLP) adaptor is developed to transform text embeddings extracted by the VLM into a joint space, within which the detector learns the region representations of class- agnostic proposals. Cross-modality features are directly aligned in the joint space, avoiding the complex feature interactions and thereby improving computational efficiency. DOSOD operates like a traditional closed-set detector during the testing phase, effectively bridging the gap between closed-set and open- set detection. Compared to the baseline YOLO-World, the proposed DOSOD significantly enhances real-time performance while maintaining comparable accuracy. The slight DOSODS model achieves a Fixed AP of 26.7%, compared to 26.2% for YOLO-World-v1-S and 22.7% for YOLO-World-v2-S, using similar backbones on the LVIS minival dataset. Meanwhile, the FPS of DOSOD-S is 57.1% higher than YOLO-World-v1S and 29.6% higher than YOLO-World-v2-S. Meanwhile, we demonstrate that the DOSOD model facilitates the deployment of edge devices. The codes and models are publicly available at https://github.com/D-Robotics-AI-Lab/DOSOD |
语种 | 英语 |
DOI | arXiv:2412.14680 |
相关网址 | 查看原文 |
出处 | Arxiv |
收录类别 | PPRN.PPRN |
WOS记录号 | PPRN:120064178 |
WOS类目 | Computer Science, Artificial Intelligence ; Computer Science, Software Engineering |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/483983 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_刘松组 |
通讯作者 | Su, Hu; Liu, Song |
作者单位 | 1.D Robot, Shenzhen, Peoples R China 2.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence Syst MAIS, Beijing, Peoples R China 3.Soochow Univ, Sch Future Sci & Engn, BeeLab, Suzhou, Peoples R China 4.Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China |
推荐引用方式 GB/T 7714 | He, Yonghao,Su, Hu,Yu, Haiyong,et al. A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space. 2024. |
条目包含的文件 | ||||||
条目无相关文件。 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[He, Yonghao]的文章 |
[Su, Hu]的文章 |
[Yu, Haiyong]的文章 |
百度学术 |
百度学术中相似的文章 |
[He, Yonghao]的文章 |
[Su, Hu]的文章 |
[Yu, Haiyong]的文章 |
必应学术 |
必应学术中相似的文章 |
[He, Yonghao]的文章 |
[Su, Hu]的文章 |
[Yu, Haiyong]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。