ShanghaiTech University Knowledge Management System
Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks | |
2024-07-25 | |
状态 | 已发表 |
摘要 | Large language models (LLMs) have demonstrated impressive versatility across numerous tasks, yet their generalization capabilities remain poorly understood. To investigate these behaviors, arithmetic tasks serve as important venues. In previous studies, seemingly unrelated mysteries still exist -- (1) models with appropriate positional embeddings can correctly perform longer unseen arithmetic operations such as addition, but their effectiveness varies in more complex tasks like multiplication; (2) models perform well for longer unseen cases in modular addition under specific moduli (e.g., modulo 100) but struggle under very close moduli (e.g., modulo 101), regardless of the positional encoding used. We believe previous studies have been treating the symptoms rather than addressing the root cause -- they have paid excessive attention to improving model components, while overlooking the differences in task properties that may be the real drivers. This is confirmed by our unified theoretical framework for different arithmetic scenarios. For example, unlike multiplication, the digital addition task has the property of translation invariance which naturally aligns with the relative positional encoding, and this combination leads to successful generalization of addition to unseen longer domains. The discrepancy in operations modulo 100 and 101 arises from the base. Modulo 100, unlike 101, is compatible with the decimal system (base 10), such that unseen information in digits beyond the units digit and the tens digit is actually not needed for the task. Extensive experiments with GPT-like models validate our theoretical predictions. These findings deepen our understanding of the generalization mechanisms, and facilitate more data-efficient model training and objective-oriented AI alignment. |
DOI | arXiv:2407.17963 |
相关网址 | 查看原文 |
出处 | Arxiv |
WOS记录号 | PPRN:91102707 |
WOS类目 | Computer Science, Artificial Intelligence |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408363 |
专题 | 信息科学与技术学院_PI研究组_张海鹏组 |
通讯作者 | Zhang, Haipeng; Yang, Yanqing |
作者单位 | 1.Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China 2.Shanghaitech Univ, Shanghai, Peoples R China 3.Fudan Univ, Shanghai, Peoples R China |
推荐引用方式 GB/T 7714 | Xu, Xingcheng,Zhao, Zibo,Zhang, Haipeng,et al. Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks. 2024. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。