An ultra-low-power face frontalization processor (FFP) is proposed for accurate face recognition in wearable devices. 3D face frontalization is essential in face recognition to guarantee human-level accuracy even with rotated or tilted faces. To reduce external memory access (EMA), which causes large power consumption, regression weight quantization with K-means clustering is proposed with the result of 81.25% EMA reduction. In addition, pipelined memory-level zero-skipping regression reduces the EMA by additional 98.43% without latency overhead. Moreover, for low-power consumption of accelerating heterogeneous workload, energy-efficient shared PE array architecture is proposed. While accelerating computation intensive process by allocating large number of PEs for utilizing data-level parallelism, unused PEs are clock-gated for preventing needless power consumption during computationally light process. Proposed workload adaptation with clock-gating showed 37.14% power reduction. The proposed FFP was implemented in 65nm CMOS process, and showed 0.53mW power consumption with 4.73fps throughput, both of which satisfy condition for always-on face recognition in wearable devices.