Multi-scale Pyramid Pooling for Deep Convolutional Representation

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 165
  • Download : 80
Compared to image representation based on low-level local descriptors, deep neural activations of Convolutional Neural Networks (CNNs) are richer in mid-level representation, but poorer in geometric invariance properties. In this paper, we present a straightforward framework for better image representation by combining the two approaches. To take advantages of both representations, we extract a fair amount of multi-scale dense local activations from a pre-trained CNN. We then aggregate the activations by Fisher kernel framework, which has been modified with a simple scale-wise normalization essential to make it suitable for CNN activations. Our representation demonstrates new state-of-the-art performances on three public datasets: 80.78% (Acc.) on MIT Indoor 67, 83.20% (mAP) on PASCAL VOC 2007 and 91.28% (Acc.) on Oxford 102 Flowers. The results suggest that our proposal can be used as a primary image representation for better performances in wide visual recognition tasks.
IEEE Computer Society and the Computer Vision Foundation (CVF)
Issue Date

CVPR2015 IEEE Conference on Computer Vision and Pattern Recognition

Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item


  • mendeley


rss_1.0 rss_2.0 atom_1.0