A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
2024-08-02
状态已发表
摘要

In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types—including text, images, videos, audio, and physiological sequences—MLLMs address the complexities of real-world applications far beyond the capabilities of single-modality systems. In this paper, we systematically sort out the applications of MLLM in multimodal tasks such as natural language, vision, and audio. We also provide a comparative analysis of the focus of different MLLMs in the tasks, and provide insights into the shortcomings of current MLLMs, and suggest potential directions for future research. Through these discussions, this paper hopes to provide valuable insights for the further development and application of MLLM.

关键词MLLMs Tasks AI Applications Fusion Techniques
DOIarXiv:2408.01319
相关网址查看原文
出处Arxiv
WOS记录号PPRN:91230090
WOS类目Computer Science, Artificial Intelligence
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408353
专题生物医学工程学院
生物医学工程学院_PI研究组_沈定刚组
通讯作者Liu, Tianming; Zhang, Shu
作者单位
1.Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
2.Northwestern Polytech Univ, Sch Automat, Xian 710072, Peoples R China
3.Northwestern Polytech Univ, Inst Med Res, Xian 710072, Peoples R China
4.Univ Georgia, Sch Comp, Athens, GA 30602, USA
5.Shaanxi Normal Univ, Sch Phys & Informat Technol, Xian 710119, Peoples R China
6.Univ Elect Sci & Technol China, Clin Hosp Chengdu Brain Sci Inst, Sch Life Sci & Technol, MOE,Key Lab Neuroinformat, Chengdu, Peoples R China
7.Augusta Univ, Sch Comp & Cyber Sci, Augusta, GA 30912, USA
8.ShanghaiTech Univ, Sch Biomed Engn, Shanghai 201210, Peoples R China
9.Shanghai United Imaging Intelligence Co Ltd, Shanghai 200230, Peoples R China
10.Shanghai Clin Res & Trial Ctr, Shanghai 201210, Peoples R China
推荐引用方式
GB/T 7714
Wang, Jiaqi,Jiang, Hanqi,Liu, Yiheng,et al. A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks. 2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Wang, Jiaqi]的文章
[Jiang, Hanqi]的文章
[Liu, Yiheng]的文章
百度学术
百度学术中相似的文章
[Wang, Jiaqi]的文章
[Jiang, Hanqi]的文章
[Liu, Yiheng]的文章
必应学术
必应学术中相似的文章
[Wang, Jiaqi]的文章
[Jiang, Hanqi]的文章
[Liu, Yiheng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。