| |||||||
ShanghaiTech University Knowledge Management System
Temporal Segment Transformer for Action Segmentation | |
2023-02-25 | |
状态 | 已发表 |
摘要 | Recognizing human actions from untrimmed videos is an important task in activity understanding, and poses unique challenges in modeling long-range temporal relations. Recent works adopt a predict-and-refine strategy which converts an initial prediction to action segments for global context modeling. However, the generated segment representations are often noisy and exhibit inaccurate segment boundaries, over-segmentation and other problems. To deal with these issues, we propose an attention based approach which we call temporal segment transformer, for joint segment relation modeling and denoising. The main idea is to de-noise segment representations using attention be-tween segment and frame representations, and also use inter-segment attention to capture temporal correlations between segments. The refined segment representations are used to predict action labels and adjust segment boundaries, and a final action segmentation is produced based on voting from segment masks. We show that this novel architecture achieves state-of-the-art accuracy on the popular 50Salads, GTEA and Breakfast benchmarks. We also conduct extensive ablations to demonstrate the effectiveness of different components of our design. |
DOI | arXiv:2302.13074 |
相关网址 | 查看原文 |
出处 | Arxiv |
WOS记录号 | PPRN:46131491 |
WOS类目 | Computer Science, Software Engineering |
文献类型 | 预印本 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/348277 |
专题 | 信息科学与技术学院_硕士生 信息科学与技术学院_博士生 |
作者单位 | 1.ShanghaiTech Univ, Shanghai, Peoples R China 2.Baidu Inc, Dept Comp Vis Technol VIS, Beijing, Peoples R China 3.Durham Univ, Dept Comp Sci, Durham, England 4.Shanghai AI Lab, Shanghai, Peoples R China |
推荐引用方式 GB/T 7714 | Liu, Zhichao,Wang, Leshan,Zhou, Desen,et al. Temporal Segment Transformer for Action Segmentation. 2023. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。