Reducing the bit-width of weights is an attractive solution for decreasing the large size of CNN models embedded in IoT devices. In the most extreme case, the bit per weight can be reduced to 1-bit in binary weight CNNs. However, this network cannot be compressed further due to the lack of sparsity because the weight distribution of the well-trained model is not biased to either +1 or -1. On the other hand, sparse ternary weight CNNs can be compressed to less than 1-bit per weight, maintaining higher accuracy than binary weight CNNs. Therefore, we propose the following for a weight compression methodology in sparse ternary weight CNNs to minimize the model size: (1) an encoding scheme exploiting high sparsity, (2) two elaborate compression techniques based on encoding direction exploration and layer-wise optimization. To verify the efficiency of hardware acceleration, we design an accelerator that fully exploits our compression scheme. Moreover, a layer rearrangement technique is presented to address a load imbalance problem that occurs during hardware acceleration. As a result, we reduce the effective bit per weight to 0.67-0.80 bit and achieve 4.52-7.70x and 1.52-2.21x improvement of performance and energy efficiency respectively, with higher accuracy compared to previous binary weight CNN work.