Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks
2024-07-25
状态已发表
摘要

Large language models (LLMs) have demonstrated impressive versatility across numerous tasks, yet their generalization capabilities remain poorly understood. To investigate these behaviors, arithmetic tasks serve as important venues. In previous studies, seemingly unrelated mysteries still exist -- (1) models with appropriate positional embeddings can correctly perform longer unseen arithmetic operations such as addition, but their effectiveness varies in more complex tasks like multiplication; (2) models perform well for longer unseen cases in modular addition under specific moduli (e.g., modulo 100) but struggle under very close moduli (e.g., modulo 101), regardless of the positional encoding used. We believe previous studies have been treating the symptoms rather than addressing the root cause -- they have paid excessive attention to improving model components, while overlooking the differences in task properties that may be the real drivers. This is confirmed by our unified theoretical framework for different arithmetic scenarios. For example, unlike multiplication, the digital addition task has the property of translation invariance which naturally aligns with the relative positional encoding, and this combination leads to successful generalization of addition to unseen longer domains. The discrepancy in operations modulo 100 and 101 arises from the base. Modulo 100, unlike 101, is compatible with the decimal system (base 10), such that unseen information in digits beyond the units digit and the tens digit is actually not needed for the task. Extensive experiments with GPT-like models validate our theoretical predictions. These findings deepen our understanding of the generalization mechanisms, and facilitate more data-efficient model training and objective-oriented AI alignment.

DOIarXiv:2407.17963
相关网址查看原文
出处Arxiv
WOS记录号PPRN:91102707
WOS类目Computer Science, Artificial Intelligence
文献类型预印本
条目标识符https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408363
专题信息科学与技术学院_PI研究组_张海鹏组
通讯作者Zhang, Haipeng; Yang, Yanqing
作者单位
1.Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
2.Shanghaitech Univ, Shanghai, Peoples R China
3.Fudan Univ, Shanghai, Peoples R China
推荐引用方式
GB/T 7714
Xu, Xingcheng,Zhao, Zibo,Zhang, Haipeng,et al. Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks. 2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可
个性服务
查看访问统计
谷歌学术
谷歌学术中相似的文章
[Xu, Xingcheng]的文章
[Zhao, Zibo]的文章
[Zhang, Haipeng]的文章
百度学术
百度学术中相似的文章
[Xu, Xingcheng]的文章
[Zhao, Zibo]的文章
[Zhang, Haipeng]的文章
必应学术
必应学术中相似的文章
[Xu, Xingcheng]的文章
[Zhao, Zibo]的文章
[Zhang, Haipeng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。