BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues
2024-03-11
状态已发表
摘要

In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about VPR: 1) For the methods based on both camera and LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data between different sensors is also a major challenge. 2) Other image-/camera-based methods, involving integrating RGB images and their derived variants (e.g., pseudo depth images, pseudo 3D point clouds), exhibit several limitations, such as the failure to effectively exploit the explicit spatial relationships between different objects. To tackle the above issues, we design a new BEV-enhanced VPR framework, nemely BEV2PR, which can generate a composite descriptor with both visual cues and spatial awareness solely based on a single camera. For the visual cues, any popular aggregation module for RGB global features can be integrated into our framework. The key points lie in: 1) We use BEV segmentation features as an explicit source of structural knowledge in constructing global features. 2) The lower layers of the pre-trained backbone from BEV map generation are shared for visual and structural streams in VPR, facilitating the learning of fine-grained local features in the visual stream. 3) The complementary visual features and structural features can jointly enhance VPR performance. Our BEV2PR framework enables consistent performance improvements over several popular camera-based VPR aggregation modules when integrating them. The experiments on our collected VPR-NuScenes dataset demonstrate an absolute gain of 2.47% on Recall@1 for the strong Conv-AP baseline to achieve the best performance in our setting, and notably, a 18.06% gain on the hard set.

DOIarXiv:2403.06600
相关网址查看原文
出处Arxiv
WOS记录号PPRN:88099335
WOS类目Computer Science, Software Engineering
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/372974
专题信息科学与技术学院
作者单位
1.CASIA, State Key Lab Multimodal Artificial Intelligence Syst MAIS, Shanghai, Peoples R China
2.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
3.ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai, Peoples R China
4.Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou, Peoples R China
5.Zhejiang Univ, State Key Lab Ind Control Technol, Hangzhou, Peoples R China
推荐引用方式
GB/T 7714
Ge, Fudong,Zhang, Yiwei,Shen, Shuhan,et al. BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues. 2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Ge, Fudong]的文章
[Zhang, Yiwei]的文章
[Shen, Shuhan]的文章
百度学术
百度学术中相似的文章
[Ge, Fudong]的文章
[Zhang, Yiwei]的文章
[Shen, Shuhan]的文章
必应学术
必应学术中相似的文章
[Ge, Fudong]的文章
[Zhang, Yiwei]的文章
[Shen, Shuhan]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。