Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation
2020-07
Source PublicationTHE 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020)
VolumeProceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Pages3619–3629
Status已发表
DOI10.18653/v1/2020.acl-main.333
Abstract

Open-domain dialogue generation has gained increasing attention in Natural Language Processing. Its evaluation requires a holistic means. Human ratings are deemed as the gold standard. As human evaluation is inefficient and costly, an automated substitute is highly desirable. In this paper, we propose holistic evaluation metrics that capture different aspects of open-domain dialogues. Our metrics consist of (1) GPT-2 based context coherence between sentences in a dialogue, (2) GPT-2 based fluency in phrasing, (3) n-gram based diversity in responses to augmented queries, and (4) textual-entailment-inference based logical self-consistency. The empirical validity of our metrics is demonstrated by strong correlations with human judgments. We open source the code and relevant materials. © 2020 Association for Computational Linguistics

Author of SourceAmazon ; Apple ; Bloomberg Engineering ; et al. ; Google ; IBM Research AI
KeywordComputational linguistics Open systems Open source software Automatic evaluation Dialogue generations Evaluation metrics Gold standards Holistic evaluations Human evaluation ITS evaluation N-grams Self-consistency Textual entailment
Conference Namethe 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020)
Conference PlaceVirtual, Online, United states
Conference DateJuly 5, 2020 - July 10, 2020
URL查看原文
Indexed ByCPCI ; CPCI-S ; EI
Language英语
PublisherAssociation for Computational Linguistics (ACL)
EI Accession Number20214411090492
EI KeywordsNatural language processing systems
EI Classification Number721.1 Computer Theory, Includes Formal Logic, Automata Theory, Switching Theory, Programming Theory ; 723 Computer Software, Data Handling and Applications ; 723.2 Data Processing and Image Processing
Original Document TypeConference article (CA)
Document Type会议论文
Identifierhttps://kms.shanghaitech.edu.cn/handle/2MSLDSTB/124035
Collection信息科学与技术学院_博士生
信息科学与技术学院_PI研究组_屠可伟组
Corresponding AuthorHan, Wenjuan; Zhou, Linqi
Affiliation
1.Department of Statistics, University of California, Los Angeles
2.School of Computing, National University of Singapore, Singapore
3.School of Information Science and Technology, ShanghaiTech University
Recommended Citation
GB/T 7714
Pang, Bo,Nijkamp, Erik,Han, Wenjuan,et al. Towards Holistic and Automatic Evaluation of Open-Domain Dialogue Generation[C]//Amazon, Apple, Bloomberg Engineering, et al., Google, IBM Research AI:Association for Computational Linguistics (ACL),2020:3619–3629.
Files in This Item: Download All
File Name/Size DocType Version Access License
Related Services
Usage statistics
Scholar Google
Similar articles in Scholar Google
[Pang, Bo]'s Articles
[Nijkamp, Erik]'s Articles
[Han, Wenjuan]'s Articles
Baidu academic
Similar articles in Baidu academic
[Pang, Bo]'s Articles
[Nijkamp, Erik]'s Articles
[Han, Wenjuan]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Pang, Bo]'s Articles
[Nijkamp, Erik]'s Articles
[Han, Wenjuan]'s Articles
Terms of Use
No data!
Social Bookmark/Share
File name: 10.18653@v1@2020.acl-main.333.pdf
Format: Adobe PDF
All comments (0)
No comment.
 

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.