ShanghaiTech University Knowledge Management System
HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision | |
2024-11-11 | |
状态 | 已发表 |
摘要 | In camera-based 3D multi-object tracking (MOT), the prevailing methods follow the tracking-by-query-propagation paradigm, which employs track queries to manage the lifecycle of identity-consistent tracklets while object queries handle the detection of new-born tracklets. However, this intertwined paradigm leads the inter-temporal tracking task and the single-frame detection task utilize the same model parameters, complicating training optimization. Drawing inspiration from studies on the roles of attention components in transformer-based decoders, we identify that the dispersing effect of self-attention necessitates object queries to match with new-born tracklets. This matching strategy diverges from the detection pre-training phase, where object queries align with all ground-truth targets, resulting in insufficient supervision signals. To address these issues, we present HSTrack, a novel plug-and-play method designed to co-facilitate multi-task learning for detection and tracking. HSTrack constructs a parallel weight-share decoder devoid of self-attention layers, circumventing competition between different types of queries. Considering the characteristics of cross-attention layer and distinct query types, our parallel decoder adopt one-to-one and one-to-many label assignment strategies for track queries and object queries, respectively. Leveraging the shared architecture, HSTrack further improve trackers for spatio-temporal modeling and quality candidates generation. Extensive experiments demonstrate that HSTrack consistently delivers improvements when integrated with various query-based 3D MOT trackers. For example, HSTrack improves the state-of-the-art PF-Track method by +2.3% AMOTA and +1.7% mAP on the nuScenes dataset. |
语种 | 英语 |
DOI | arXiv:2411.06780 |
相关网址 | 查看原文 |
出处 | Arxiv |
收录类别 | PPRN.PPRN |
WOS记录号 | PPRN:119160869 |
WOS类目 | Computer Science, Software Engineering |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/464717 |
专题 | 信息科学与技术学院 |
通讯作者 | Gao, Jin |
作者单位 | 1.CASIA, State Key Lab Multimodal Artificial Intelligence Syst MAIS, Beijing, Peoples R China 2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China 3.Shanghai Tech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China |
推荐引用方式 GB/T 7714 | Lin, Shubo,Kou, Yutong,Li, Bing,et al. HSTrack: Bootstrap End-to-End Multi-Camera 3D Multi-object Tracking with Hybrid Supervision. 2024. |
条目包含的文件 | ||||||
条目无相关文件。 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Lin, Shubo]的文章 |
[Kou, Yutong]的文章 |
[Li, Bing]的文章 |
百度学术 |
百度学术中相似的文章 |
[Lin, Shubo]的文章 |
[Kou, Yutong]的文章 |
[Li, Bing]的文章 |
必应学术 |
必应学术中相似的文章 |
[Lin, Shubo]的文章 |
[Kou, Yutong]的文章 |
[Li, Bing]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。