ShanghaiTech University Knowledge Management System
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks | |
Wang, Jiaqi1; Jiang, Hanqi4; Liu, Yiheng2; Ma, Chong2; Zhang, Xu5; Pan, Yi4; Liu, Mengyuan5; Gu, Peiran5; Xia, Sichen2; Li, Wenjun; Zhang, Yutong3; Wu, Zihao4; Liu, Zhengliang4; Zhong, Tianyang2; Ge, Bao5; Zhang, Tuo2; Qiang, Ning5; Hu, Xintao2; Jiang, Xi6; Zhang, Xin3; Zhang, Wei7; Shen, Dinggang8,9,10 ![]() | |
2024-08-02 | |
状态 | 已发表 |
摘要 | In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types—including text, images, videos, audio, and physiological sequences—MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM. |
关键词 | MLLMs Tasks AI Applications Fusion Techniques |
DOI | arXiv:2408.01319 |
相关网址 | 查看原文 |
出处 | Arxiv |
WOS记录号 | PPRN:91230090 |
WOS类目 | Computer Science, Artificial Intelligence |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408353 |
专题 | 生物医学工程学院 生物医学工程学院_PI研究组_沈定刚组 |
通讯作者 | Liu, Tianming; Zhang, Shu |
作者单位 | 1.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China 2.Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China 3.Northwestern Polytech Univ, Inst Med Res, Xian 710072, Peoples R China 4.Univ Georgia, Sch Comp, Athens, GA 30602, USA 5.Shaanxi Normal Univ, Sch Phys & Informat Technol, Xian 710119, Peoples R China 6.Univ Elect Sci & Technol China, Clin Hosp Chengdu Brain Sci Inst, Sch Life Sci & Technol, MOE,Key Lab Neuroinformat, Chengdu, Peoples R China 7.Augusta Univ, Sch Comp & Cyber Sci, Augusta, GA 30912, USA 8.ShanghaiTech Univ, Sch Biomed Engn, Shanghai 201210, Peoples R China 9.Shanghai United Imaging Intelligence Co Ltd, Shanghai 200230, Peoples R China 10.Shanghai Clin Res & Trial Ctr, Shanghai 201210, Peoples R China |
推荐引用方式 GB/T 7714 | Wang, Jiaqi,Jiang, Hanqi,Liu, Yiheng,et al. A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks. 2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。