It is well known that attention plays an important role in human perception [31, 57, 10]. One important property of a human visual system is that one does not attempt to process a whole scene at once. Instead, humans exploit a sequence of partial glimpses and selectively focus on salient parts in order to capture visual structure better . There have been only few attempts to incorporate attention mechanism for improving performance of convolutional neural networks (CNNs) in recognition tasks. In this dissertation, we focus on how to utilize ‘attention mechanism’ in the context of deep CNN design for object recognition. We make the following hypothesis; Assuming CNN as an approximator of a human visual system, adding attention mechanisms within CNN will facilitate the effective feature learning.
We propose a two types of attention-integrated deep CNN: attention in a network backbone and attention in a task specific head. Specifically, we design a simple yet effective attention module called, convolutional block attention module (CBAM) and apply it to both backbone and task specific head of deep CNN. We conduct extensive subjective and objective evaluation and show the efficacy of the proposed method in both types.