Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data

doi:10.1016/j.ymeth.2020.10.001

	Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data
	Xu, Fan1 ; Wang, Shike1 ; Dai, Xinnan1 ; Mundra, Piyushkumar A.2; Zheng, Jie1
	2021-05
发表期刊	METHODS (IF:4.2[JCR-2023],3.8[5-Year])
ISSN	1046-2023
EISSN	1095-9130
卷号	189 页码:65-73
发表状态	已发表
DOI	10.1016/j.ymeth.2020.10.001
摘要	Single-cell protein abundance is a fundamental type of information to characterize cell states. Due to high cost and technical barriers, however, direct quantification of proteins is difficult. Single-cell RNA sequencing (scRNAseq) data, serving as a cost-effective substitute of single-cell proteomics, may not accurately reflect protein expression levels due to measurement error, noise, post-transcriptional and translational regulation, etc. The recently emerging single-cell multimodal omics data, e.g. CITE-seq and REAP-seq, can simultaneously profile RNA and protein abundances in single cells, providing labeled data for predictive modeling in a supervised learning framework. Deep neural network-based transfer learning method has been applied to imputation of surface protein abundances from single-cell transcriptomic data. However, it is unclear if the artificial neural network is the best model, and it is desirable to improve the prediction performance (e.g. accuracy, interpretability) of machine learning models. In this paper, we compared several tree-based ensemble learning methods with neural network models, and found that ensemble learning often performed better than neural network, and Random Forest (RF) performed the best overall. Moreover, we used the feature importance scores from RF to interpret biological mechanisms underlying the prediction. Our study demonstrates the effectiveness of ensemble learning for reliable protein abundances prediction using single-cell multimodal omics data, and paves the way for knowledge discovery by mining single-cell multi-omics data in large scale.
关键词	Single cell Ensemble learning Protein abundance Transcriptomic CITE-seq REAP-seq
收录类别	SCIE
语种	英语
WOS研究方向	Biochemistry & Molecular Biology
WOS类目	Biochemical Research Methods ; Biochemistry & Molecular Biology
WOS记录号	WOS:000635650900008
出版者	ACADEMIC PRESS INC ELSEVIER SCIENCE
原始文献类型	Article
引用统计	正在获取...
文献类型	期刊论文
条目标识符	https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/126217
专题	生命科学与技术学院_硕士生信息科学与技术学院_硕士生信息科学与技术学院_PI研究组_郑杰组
共同第一作者	Wang, Shike
通讯作者	Zheng, Jie
作者单位	1.ShanghaiTech University, School of Information Science & Technology, 302-D,SIST Bldg 2,393 Middle Huaxia Rd, Shanghai 201210, Peoples R China; 2.University of Manchester, Cancer Research UK Manchester Institute, Mol Oncol Grp, Manchester, Lancs, England
第一作者单位	上海科技大学
通讯作者单位	上海科技大学
第一作者的第一单位	上海科技大学
推荐引用方式 GB/T 7714	Xu, Fan,Wang, Shike,Dai, Xinnan,et al. Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data[J]. METHODS,2021,189:65-73.
APA	Xu, Fan,Wang, Shike,Dai, Xinnan,Mundra, Piyushkumar A.,&Zheng, Jie.(2021).Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data.METHODS,189,65-73.
MLA	Xu, Fan,et al."Ensemble learning models that predict surface protein abundance from single-cell multimodal omics data".METHODS 189(2021):65-73.