| |||||||
ShanghaiTech University Knowledge Management System
AS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION | |
2022 | |
会议录名称 | ICLR 2022 - 10TH INTERNATIONAL CONFERENCE ON LEARNING REPRESENTATIONS |
发表状态 | 已发表 |
摘要 | An Axial Shifted MLP architecture (AS-MLP) is proposed in this paper. Different from MLP-Mixer, where the global spatial feature is encoded for information flow through matrix transposition and one token-mixing MLP, we pay more attention to the local features interaction. By axially shifting channels of the feature map, AS-MLP is able to obtain the information flow from different axial directions, which captures the local dependencies. Such an operation enables us to utilize a pure MLP architecture to achieve the same local receptive field as CNN-like architecture. We can also design the receptive field size and dilation of blocks of AS-MLP, etc, in the same spirit of convolutional neural networks. With the proposed AS-MLP architecture, our model obtains 83.3% Top-1 accuracy with 88M parameters and 15.2 GFLOPs on the ImageNet-1K dataset. Such a simple yet effective architecture outperforms all MLP-based architectures and achieves competitive performance compared to the transformer-based architectures (e.g., Swin Transformer) even with slightly lower FLOPs. In addition, AS-MLP is also the first MLP-based architecture to be applied to the downstream tasks (e.g., object detection and semantic segmentation). The experimental results are also impressive. Our proposed AS-MLP obtains 51.5 mAP on the COCO validation set and 49.5 MS mIoU on the ADE20K dataset, which is competitive compared to the transformer-based architectures. Our AS-MLP establishes a strong baseline of MLP-based architecture. Code is available at https://github.com/svip-lab/AS-MLP. © 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved. |
会议录编者/会议主办者 | ByteDance ; et al. ; Meta AI ; Microsoft ; Qualcomm ; Sea Al Lab |
关键词 | Architecture Convolutional neural networks Network architecture Object detection Semantic Segmentation Axial direction Feature interactions Feature map Flowthrough Information flows Local feature Matrix transposition Receptive field sizes Receptive fields Spatial features |
会议名称 | 10th International Conference on Learning Representations, ICLR 2022 |
会议地点 | Virtual, Online |
会议日期 | April 25, 2022 - April 29, 2022 |
收录类别 | EI |
语种 | 英语 |
出版者 | International Conference on Learning Representations, ICLR |
EI入藏号 | 20231213775870 |
EI主题词 | Semantics |
EI分类号 | 402 Buildings and Towers - 723.2 Data Processing and Image Processing - 723.4 Artificial Intelligence |
原始文献类型 | Conference article (CA) |
文献类型 | 会议论文 |
条目标识符 | https://kms.shanghaitech.edu.cn/handle/2MSLDSTB/294852 |
专题 | 信息科学与技术学院_博士生 信息科学与技术学院_PI研究组_高盛华组 信息科学与技术学院_硕士生 |
作者单位 | 1.ShanghaiTech University, China; 2.Youtu Lab, Tencent; 3.Shanghai Engineering Research Center of Intelligent Vision and Imaging, China; 4.Shanghai Engineering Research Center of Energy Efficient and Custom AI IC, China |
第一作者单位 | 上海科技大学 |
第一作者的第一单位 | 上海科技大学 |
推荐引用方式 GB/T 7714 | Lian, Dongze,Yu, Zehao,Sun, Xing,et al. AS-MLP: AN AXIAL SHIFTED MLP ARCHITECTURE FOR VISION[C]//ByteDance, et al., Meta AI, Microsoft, Qualcomm, Sea Al Lab:International Conference on Learning Representations, ICLR,2022. |
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 |
修改评论
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。