Graph-guided and Deep Feature Fusion Hashing for Unsupervised Cross-modal Retrieval

	Graph-guided and Deep Feature Fusion Hashing for Unsupervised Cross-modal Retrieval
	Ning Kang1,2,3 ; Li Ji 1; Xiaowei Feng 1; Yingguan Wang1
	2025
会议录名称	INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE INNOVATIONS (IS-AII 2025)
发表状态	正式接收
摘要	Unsupervised cross-modal hashing techniques have gained significant attention in large-scale multimedia retrieval tasks due to their ability to maintain high retrieval efficiency and minimal storage costs without requiring labeled data. However, despite the substantial progress made by existing methods, two primary issues persist: (1) Insufficient structural information in the similarity matrices and the reliance on fixed weight coefficients for generating inter-modal matrices, resulting in limited generalization capabilities; (2) Inadequate constraint capabilities in the feature fusion process, leading to suboptimal information integration. To overcome these limitations, this paper proposes a novel unsupervised cross-modal hashing method based on graph-guided deep feature fusion, named GDFH. Specifically, an adaptive similarity matrix construction module is introduced, which dynamically constructs similarity matrices through nonlinear transformations and learnable adaptive weight parameters to capture latent modal associations and semantic relationships. To further enhance feature representation, GDFH incorporates a heterogeneous graph neural network (HGNN) aimed at effectively capturing fine-grained semantic relationships between features. This approach allows for the explicit creation of more representative joint-modal features, which collaborate with existing features to facilitate the learning of hash functions. Experimental results on the MIRFlickr-25K and NUS-WIDE public datasets demonstrate that GDFH outperforms baseline methods, delivering superior performance. Specifically, for the text-to-image retrieval subtask on the MIRFlickr-25K dataset with a 64-bit hash code length, the retrieval accuracy improved by 2.81%.
关键词	Cross-modal Retrieval Unsupervised Cross-modal Hashing Similarity Matrix Heterogeneous Graph Neural Network
收录类别	EI
语种	英语
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/503626
专题	信息科学与技术学院_特聘教授组_王营冠组信息科学与技术学院_硕士生
通讯作者	Yingguan Wang
作者单位	1.Shanghai Institute of Microsystems and Information Technology, Shanghai, China; ShanghaiTech University, Shanghai 2.ShanghaiTech University, Shanghai, China 3.University of Chinese Academy of Sciences, Beijing, China
第一作者单位	上海科技大学
通讯作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Ning Kang,Li Ji,Xiaowei Feng,et al. Graph-guided and Deep Feature Fusion Hashing for Unsupervised Cross-modal Retrieval[C],2025.