消息
×
loading..
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
2024-12-19
状态已发表
摘要Open-set object detection (OSOD) is highly desirable for robotic manipulation in unstructured environments. However, existing OSOD methods often fail to meet the requirements of robotic applications due to their high computational burden and complex deployment. To address this issue, this paper proposes a light-weight framework called Decoupled OSOD (DOSOD), which is a practical and highly efficient solution to support real-time OSOD tasks in robotic systems. Specifically, DOSOD builds upon the YOLO-World pipeline by integrating a vision-language model (VLM) with a detector. A Multilayer Perceptron (MLP) adaptor is developed to transform text embeddings extracted by the VLM into a joint space, within which the detector learns the region representations of class- agnostic proposals. Cross-modality features are directly aligned in the joint space, avoiding the complex feature interactions and thereby improving computational efficiency. DOSOD operates like a traditional closed-set detector during the testing phase, effectively bridging the gap between closed-set and open- set detection. Compared to the baseline YOLO-World, the proposed DOSOD significantly enhances real-time performance while maintaining comparable accuracy. The slight DOSODS model achieves a Fixed AP of 26.7%, compared to 26.2% for YOLO-World-v1-S and 22.7% for YOLO-World-v2-S, using similar backbones on the LVIS minival dataset. Meanwhile, the FPS of DOSOD-S is 57.1% higher than YOLO-World-v1S and 29.6% higher than YOLO-World-v2-S. Meanwhile, we demonstrate that the DOSOD model facilitates the deployment of edge devices. The codes and models are publicly available at https://github.com/D-Robotics-AI-Lab/DOSOD
语种英语
DOIarXiv:2412.14680
相关网址查看原文
出处Arxiv
收录类别PPRN.PPRN
WOS记录号PPRN:120064178
WOS类目Computer Science, Artificial Intelligence ; Computer Science, Software Engineering
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/483983
专题信息科学与技术学院
信息科学与技术学院_PI研究组_刘松组
通讯作者Su, Hu; Liu, Song
作者单位
1.D Robot, Shenzhen, Peoples R China
2.Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence Syst MAIS, Beijing, Peoples R China
3.Soochow Univ, Sch Future Sci & Engn, BeeLab, Suzhou, Peoples R China
4.Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
He, Yonghao,Su, Hu,Yu, Haiyong,et al. A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space. 2024.
条目包含的文件
条目无相关文件。
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[He, Yonghao]的文章
[Su, Hu]的文章
[Yu, Haiyong]的文章
百度学术
百度学术中相似的文章
[He, Yonghao]的文章
[Su, Hu]的文章
[Yu, Haiyong]的文章
必应学术
必应学术中相似的文章
[He, Yonghao]的文章
[Su, Hu]的文章
[Yu, Haiyong]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。