当前位置: X-MOL 学术Complex Intell. Syst. › 论文详情
Our official English website, www.x-mol.net, welcomes your feedback! (Note: you will need to create a separate account there.)
An attention mechanism module with spatial perception and channel information interaction
Complex & Intelligent Systems ( IF 5.8 ) Pub Date : 2024-05-06 , DOI: 10.1007/s40747-024-01445-9
Yifan Wang , Wu Wang , Yang Li , Yaodong Jia , Yu Xu , Yu Ling , Jiaqi Ma

In the field of deep learning, the attention mechanism, as a technology that mimics human perception and attention processes, has made remarkable achievements. The current methods combine a channel attention mechanism and a spatial attention mechanism in a parallel or cascaded manner to enhance the model representational competence, but they do not fully consider the interaction between spatial and channel information. This paper proposes a method in which a space embedded channel module and a channel embedded space module are cascaded to enhance the model’s representational competence. First, in the space embedded channel module, to enhance the representational competence of the region of interest in different spatial dimensions, the input tensor is split into horizontal and vertical branches according to spatial dimensions to alleviate the loss of position information when performing 2D pooling. To smoothly process the features and highlight the local features, four branches are obtained through global maximum and average pooling, and the features are aggregated by different pooling methods to obtain two feature tensors with different pooling methods. To enable the output horizontal and vertical feature tensors to focus on different pooling features simultaneously, the two feature tensors are segmented and dimensionally transposed according to spatial dimensions, and the features are later aggregated along the spatial direction. Then, in the channel embedded space module, for the problem of no cross-channel connection between groups in grouped convolution and for which the parameters are large, this paper uses adaptive grouped banded matrices. Based on the banded matrices utilizing the mapping relationship that exists between the number of channels and the size of the convolution kernels, the convolution kernel size is adaptively computed to achieve adaptive cross-channel interaction, enhancing the correlation between the channel dimensions while ensuring that the spatial dimensions remain unchanged. Finally, the output horizontal and vertical weights are used as attention weights. In the experiment, the attention mechanism module proposed in this paper is embedded into the MobileNetV2 and ResNet networks at different depths, and extensive experiments are conducted on the CIFAR-10, CIFAR-100 and STL-10 datasets. The results show that the method in this paper captures and utilizes the features of the input data more effectively than the other methods, significantly improving the classification accuracy. Despite the introduction of an additional computational burden (0.5 M), however, the overall performance of the model still achieves the best results when the computational overhead is comprehensively considered.



中文翻译:

具有空间感知和通道信息交互的注意力机制模块

在深度学习领域,注意力机制作为一种模仿人类感知和注意力过程的技术,取得了令人瞩目的成就。目前的方法以并行或级联的方式结合通道注意机制和空间注意机制来增强模型的表示能力,但没有充分考虑空间信息和通道信息之间的相互作用。本文提出了一种将空间嵌入通道模块和通道嵌入空间模块级联的方法,以增强模型的表示能力。首先,在空间嵌入通道模块中,为了增强感兴趣区域在不同空间维度的表示能力,将输入张量根据空间维度分为水平和垂直分支,以减轻进行2D池化时位置信息的损失。为了平滑地处理特征并突出局部特征,通过全局最大池化和平均池化得到四个分支,并通过不同池化方法对特征进行聚合,得到不同池化方法的两个特征张量。为了使输出的水平和垂直特征张量同时关注不同的池化特征,根据空间维度对两个特征张量进行分割和维度转置,然后沿空间方向聚合特征。然后,在通道嵌入空间模块中,针对分组卷积中组间无跨通道连接且参数较大的问题,本文采用自适应分组带状矩阵。基于带状矩阵,利用通道数与卷积核大小之间存在的映射关系,自适应计算卷积核大小,实现自适应跨通道交互,增强通道维度之间的相关性,同时保证空间维度保持不变。最后,输出的水平和垂直权重用作注意力权重。实验中,将本文提出的注意力机制模块嵌入到不同深度的MobileNetV2和ResNet网络中,并在CIFAR-10、CIFAR-100和STL-10数据集上进行了广泛的实验。结果表明,本文方法比其他方法更有效地捕捉和利用输入数据的特征,显着提高了分类精度。尽管引入了额外的计算负担(0.5 M),但在综合考虑计算开销时,模型的整体性能仍然达到了最佳结果。

更新日期:2024-05-09
down
wechat
bug