Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation

doi:10.18653/v1/2023.findings-acl.482

	Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
	Haoyi Wu1,2 ; Kewei Tu1,2
	2023-07
会议录名称	FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2023
ISSN	0736-587X
页码	7613–7636
发表状态	已发表
DOI	10.18653/v1/2023.findings-acl.482
摘要	Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One vastly successful class of neural models is transformers. When used as an encoder, a transformer produces contextual representation of words in the input sentence. In this work, we propose a new model of contextual word representation, not from a neural perspective, but from a purely syntactic and probabilistic perspective. Specifically, we design a conditional random field that models discrete latent representations of all words in a sentence as well as dependency arcs between them; and we use mean field variational inference for approximate inference. Strikingly, we find that the computation graph of our model resembles transformers, with correspondences between dependencies and self-attention and between distributions over latent representations and contextual embeddings of words. Experiments show that our model performs competitively to transformers on small to medium sized datasets. We hope that our work could help bridge the gap between traditional syntactic and probabilistic approaches and cutting-edge neural approaches to NLP, and inspire more linguistically-principled neural approaches in the future.
会议录编者/会议主办者	Association for Computational Linguistics ; Bloomberg ; et al. ; Google Research ; LIVEPERSON ; Meta ; Microsoft
关键词	Deep learning Natural language processing systems Structural design Contextual words Dependency model Language processing Mean-field Natural languages Neural modelling Probabilistics Random fields Syntactic structure Word representations
会议名称	ACL2023
出版地	Toronto, Canada
会议地点	Toronto, Canada
会议日期	2023-07
学科门类	工学::计算机科学与技术（可授工学、理学学位）
URL	查看原文
收录类别	EI
语种	英语
出版者	Association for Computational Linguistics (ACL)
EI入藏号	20234515012242
EI主题词	Syntactics
EI分类号	408.1 Structural Design, General ; 461.4 Ergonomics and Human Factors Engineering ; 723.2 Data Processing and Image Processing
原始文献类型	Conference article (CA)
文献类型	会议论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/345942
专题	信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_屠可伟组
通讯作者	Kewei Tu
作者单位	1.School of Information Science and Technology, ShanghaiTech University 2.Shanghai Engineering Research Center of Intelligent Vision and Imaging
第一作者单位	信息科学与技术学院
通讯作者单位	信息科学与技术学院
第一作者的第一单位	信息科学与技术学院
推荐引用方式 GB/T 7714	Haoyi Wu,Kewei Tu. Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation[C]//Association for Computational Linguistics, Bloomberg, et al., Google Research, LIVEPERSON, Meta, Microsoft. Toronto, Canada:Association for Computational Linguistics (ACL),2023:7613–7636.