An energy-efficient deep-learning processor called DNPU is proposed for the embedded processing of convolutional neural networks (CNNs) and recurrent neural networks (RNNs) in mobile platforms. DNPU uses a heterogeneous multi-core architecture to maximize energy efficiency in both CNNs and RNNs. In each core, a memory architecture, data paths, and processing elements are optimized depending on the characteristics of each network. Also, a mixed workload division method is proposed to minimize off-chip memory access in CNNs, and a quantization table-based matrix multiplier is proposed to remove duplicated multiplications in RNNs.