THE DEVIL IS IN THE OBJECT BOUNDARY: TOWARDS ANNOTATION-FREE INSTANCE SEGMENTATION USING FOUNDATION MODELS

	THE DEVIL IS IN THE OBJECT BOUNDARY: TOWARDS ANNOTATION-FREE INSTANCE SEGMENTATION USING FOUNDATION MODELS
	Shi, Cheng; Yang, Sibei
	2024
会议录名称	12TH INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS, ICLR 2024
摘要	Foundation models, pre-trained on a large amount of data have demonstrated impressive zero-shot capabilities in various downstream tasks. However, in object detection and instance segmentation, two fundamental computer vision tasks heavily reliant on extensive human annotations, foundation models such as SAM and DINO struggle to achieve satisfactory performance. In this study, we reveal that the devil is in the object boundary, i.e., these foundation models fail to discern boundaries between individual objects. For the first time, we probe that CLIP, which has never accessed any instance-level annotations, can provide a highly beneficial and strong instance-level boundary prior in the clustering results of its particular intermediate layer. Following this surprising observation, we propose Zip which Zips up CLip and SAM in a novel classification-first-then-discovery pipeline, enabling annotation-free, complex-scene-capable, open-vocabulary object detection and instance segmentation. Our Zip significantly boosts SAM's mask AP on COCO dataset by 12.5% and establishes state-of-the-art performance in various settings, including training-free, self-training, and label-efficient finetuning. Furthermore, annotation-free Zip even achieves comparable performance to the best-performing open-vocabulary object detecters using base annotations. Code is released at https://github.com/ChengShiest/Zip-Your-CLIP. © 2024 12th International Conference on Learning Representations, ICLR 2024. All rights reserved.
关键词	Clustering algorithms Foundations Object recognition Zero-shot learning Clustering results Down-stream Foundation models Human annotations Individual objects Intermediate layers Large amounts of data Object boundaries Objects detection Performance
会议名称	12th International Conference on Learning Representations, ICLR 2024
会议地点	Hybrid, Vienna, Austria
会议日期	May 7, 2024 - May 11, 2024
收录类别	EI
语种	英语
出版者	International Conference on Learning Representations, ICLR
EI入藏号	20243216835513
EI主题词	Object detection
EI分类号	483.2 Foundations ; 723.2 Data Processing and Image Processing ; 903.1 Information Sources and Analysis
原始文献类型	Conference article (CA)
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/411256
专题	信息科学与技术学院信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_杨思蓓组
通讯作者	Yang, Sibei
作者单位	School of Information Science and Technology, ShanghaiTech University, China
第一作者单位	信息科学与技术学院
通讯作者单位	信息科学与技术学院
第一作者的第一单位	信息科学与技术学院
推荐引用方式 GB/T 7714	Shi, Cheng,Yang, Sibei. THE DEVIL IS IN THE OBJECT BOUNDARY: TOWARDS ANNOTATION-FREE INSTANCE SEGMENTATION USING FOUNDATION MODELS[C]:International Conference on Learning Representations, ICLR,2024.