Self-supervised monocular depth estimation methods have been proposed to train a depth network without ground-truth since collecting depth annotations requires tremendous effort. The self-supervised methods take advantage of the photometric loss as a main supervision signal to optimize a depth network. However, learning of the depth network is hindered, since the photometric loss is ambiguous in pixels of moving objects and occluded or texture-less regions. To address this problem, we propose a self-distillation method that provides depth consistency as a new supervision signal, which regularizes the depth network. We found that the existing depth network is not robust to distorted input images. Inspired by this observation, we train the depth network with depth consistency so that the depth network is robust to the distortions. The depth network to which our method is applied shows meaningful improvements over the models to which it is not. In addition, we show that our method outperforms the state-of-the-art methods on the KITTI dataset.