Compressing Sparse Ternary Weight Convolutional Neural Networks for Efficient Hardware Acceleration

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 21
  • Download : 0
Reducing the bit-width of weights is an attractive solution for decreasing the large size of CNN models embedded in IoT devices. In the most extreme case, the bit per weight can be reduced to 1-bit in binary weight CNNs. However, this network cannot be compressed further due to the lack of sparsity because the weight distribution of the well-trained model is not biased to either +1 or -1. On the other hand, sparse ternary weight CNNs can be compressed to less than 1-bit per weight, maintaining higher accuracy than binary weight CNNs. Therefore, we propose the following for a weight compression methodology in sparse ternary weight CNNs to minimize the model size: (1) an encoding scheme exploiting high sparsity, (2) two elaborate compression techniques based on encoding direction exploration and layer-wise optimization. To verify the efficiency of hardware acceleration, we design an accelerator that fully exploits our compression scheme. Moreover, a layer rearrangement technique is presented to address a load imbalance problem that occurs during hardware acceleration. As a result, we reduce the effective bit per weight to 0.67-0.80 bit and achieve 4.52-7.70x and 1.52-2.21x improvement of performance and energy efficiency respectively, with higher accuracy compared to previous binary weight CNN work.
Publisher
Association for Computing Machinery / IEEE
Issue Date
2019-07-30
Language
English
Citation

2019 ACM/IEEE International Symposium on Low Power Electronics and Design

URI
http://hdl.handle.net/10203/264258
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0