ShanghaiTech University Knowledge Management System
An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification | |
Xinrui Zhou1; Rui Yin1; Jie Zheng2; Chee-Keong Kwoh1 | |
2019 | |
发表期刊 | IEEE ACCESS |
ISSN | 2169-3536 |
卷号 | 7页码:7348-7356 |
发表状态 | 已发表 |
DOI | 10.1109/ACCESS.2018.2890096 |
摘要 | Feature engineering aims at representing non-numeric data with numeric features that keep the essential information of the underlying problem, and it is a non-trivial process in building a predictive model. In bioinformatics, there is a profound scale of DNA and protein sequences available, but far from being fully utilized. Computational models can facilitate the analyses of large-scale data. However, most computational models require a numeric representation as input. Expert knowledge can help design features to cast the raw symbolic data effectively. But generally, the features vary from case to case and have to be redesigned for a problem. Automated feature engineering, i.e., an encoding scheme automating the construction of features, saves the redesigning process and allows the researchers to try different representations with minimal effort. This is more in line with the explosion of data and the goal of building an intelligent system. In this paper, we introduce an encoding scheme for protein sequences, which encodes the representative sequence dataset into a numeric matrix that can be fed into a downstream learning model. The method, Context-Free Encoding Scheme (CFreeEnS), was proposed for a dataset with labels for pairwise sequences. Here, we improve the method by making it applicable to a batch of protein sequences, requiring no sequence alignment beforehand. The improved method is applied to protein classification at the functional level, including identifying antimicrobial peptides, screening tumor homing peptides, and detecting hemolytic peptides and phage virion proteins. Compared with the traditional methods using task-specific designed features, CFreeEnS improves the predicting accuracy, with an increase ranging from 5.54% to 14.14%. The results indicate that the improved CFreeEnS, free from dependence on carefully designed features, is promising in capturing generic priors and essential properties of amino acids, thereby serving as an automated feature engineering method for protein sequences. |
关键词 | Encoding scheme feature engineering information representation machine learning |
URL | 查看原文 |
收录类别 | SCI ; SCIE ; EI |
语种 | 英语 |
资助项目 | Singapore Ministry of Education[RG21/15] ; Singapore Ministry of Education[2015-T1-001-169-11] |
WOS研究方向 | Computer Science ; Engineering ; Telecommunications |
WOS类目 | Computer Science, Information Systems ; Engineering, Electrical & Electronic ; Telecommunications |
WOS记录号 | WOS:000457073000001 |
出版者 | IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
EI入藏号 | 20190506450952 |
EI主题词 | Amino acids ; Computation theory ; Computational methods ; Diagnosis ; DNA sequences ; Encoding (symbols) ; Intelligent systems ; Learning systems ; Numerical models ; Peptides |
EI分类号 | Bioengineering and Biology:461 ; Information Theory and Signal Processing:716.1 ; Computer Theory, Includes Formal Logic, Automata Theory, Switching Theory, Programming Theory:721.1 ; Data Processing and Image Processing:723.2 ; Artificial Intelligence:723.4 ; Organic Compounds:804.1 ; Mathematics:921 |
WOS关键词 | PREDICTING ANTIGENIC VARIANTS ; INFLUENZA-VIRUS ; SEQUENCE ; REPRESENTATION ; FAMILIES |
原始文献类型 | Article |
来源库 | IEEE |
引用统计 | |
文献类型 | 期刊论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/29883 |
专题 | 信息科学与技术学院 信息科学与技术学院_PI研究组_郑杰组 |
作者单位 | 1.School of Computer Science and Engineering, Nanyang Technological University, Singapore 2.School of Information Science and Technology, ShanghaiTech University, Shanghai, China |
推荐引用方式 GB/T 7714 | Xinrui Zhou,Rui Yin,Jie Zheng,et al. An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification[J]. IEEE ACCESS,2019,7:7348-7356. |
APA | Xinrui Zhou,Rui Yin,Jie Zheng,&Chee-Keong Kwoh.(2019).An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification.IEEE ACCESS,7,7348-7356. |
MLA | Xinrui Zhou,et al."An Encoding Scheme Capturing Generic Priors and Properties of Amino Acids Improves Protein Classification".IEEE ACCESS 7(2019):7348-7356. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
个性服务 |
查看访问统计 |
谷歌学术 |
谷歌学术中相似的文章 |
[Xinrui Zhou]的文章 |
[Rui Yin]的文章 |
[Jie Zheng]的文章 |
百度学术 |
百度学术中相似的文章 |
[Xinrui Zhou]的文章 |
[Rui Yin]的文章 |
[Jie Zheng]的文章 |
必应学术 |
必应学术中相似的文章 |
[Xinrui Zhou]的文章 |
[Rui Yin]的文章 |
[Jie Zheng]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。