Omni-directional images are becoming more prevalent for understanding the scene of all directions around a camera, as they provide a much wider field-of-view (FoV) compared to conventional images. In this work, we present a novel approach to represent omni-directional images and suggest how to apply CNNs on the proposed image representation. The proposed image representation method utilizes a spherical polyhedron to reduce distortion introduced inevitably when sampling pixels on a non-Euclidean spherical surface around the camera center. To apply convolution operation on our representation of images, we stack the neighboring pixels on top of each pixel and multiply with trainable parameters. This approach enables us to apply the same CNN architectures used in conventional Euclidean 2D images on our proposed method in a straightforward manner. Compared to the previous work, we additionally compare different designs of kernels that can be applied to our proposed method. We also show that our method outperforms in monocular depth estimation task compared to other state-of-the-art representation methods of omni-directional images. In addition, we propose a novel method to fit bounding ellipses of arbitrary orientation using object detection networks and apply it to an omni-directional real-world human detection dataset.