Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks

doi:arXiv:2407.17963

	Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks
	Xu, Xingcheng 1; Zhao, Zibo 2; Zhang, Haipeng2 ; Yang, Yanqing 1,3
	2024-07-25
状态	已发表
摘要	Large language models (LLMs) have demonstrated impressive versatility across numerous tasks, yet their generalization capabilities remain poorly understood. To investigate these behaviors, arithmetic tasks serve as important venues. In previous studies, seemingly unrelated mysteries still exist -- (1) models with appropriate positional embeddings can correctly perform longer unseen arithmetic operations such as addition, but their effectiveness varies in more complex tasks like multiplication; (2) models perform well for longer unseen cases in modular addition under specific moduli (e.g., modulo 100) but struggle under very close moduli (e.g., modulo 101), regardless of the positional encoding used. We believe previous studies have been treating the symptoms rather than addressing the root cause -- they have paid excessive attention to improving model components, while overlooking the differences in task properties that may be the real drivers. This is confirmed by our unified theoretical framework for different arithmetic scenarios. For example, unlike multiplication, the digital addition task has the property of translation invariance which naturally aligns with the relative positional encoding, and this combination leads to successful generalization of addition to unseen longer domains. The discrepancy in operations modulo 100 and 101 arises from the base. Modulo 100, unlike 101, is compatible with the decimal system (base 10), such that unseen information in digits beyond the units digit and the tens digit is actually not needed for the task. Extensive experiments with GPT-like models validate our theoretical predictions. These findings deepen our understanding of the generalization mechanisms, and facilitate more data-efficient model training and objective-oriented AI alignment.
DOI	arXiv:2407.17963
相关网址	查看原文
出处	Arxiv
WOS记录号	PPRN:91102707
WOS类目	Computer Science, Artificial Intelligence
文献类型	预印本
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/408363
专题	信息科学与技术学院_PI研究组_张海鹏组
通讯作者	Zhang, Haipeng; Yang, Yanqing
作者单位	1.Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China 2.Shanghaitech Univ, Shanghai, Peoples R China 3.Fudan Univ, Shanghai, Peoples R China
推荐引用方式 GB/T 7714	Xu, Xingcheng,Zhao, Zibo,Zhang, Haipeng,et al. Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks. 2024.