The 3D hand gesture interface (HGI) for virtual reality and mixed reality on smart mobile devices is strongly dependent upon the robust depth-estimation with low latency and power consumption. However, the conventional depth-estimation hardware such as active depth sensors and stereo matching accelerators cannot realize the always-on and natural 3D HGI on mobile platform due to their large power consumption from active depth sensors and computations as well as the massive external memory bandwidth, respectively. To resolve the limit, we propose a depth-estimation processor that realizes the always-on and natural 3D HGI with algorithm and hardware co-optimization. The processor features: 1) shifter-based adaptive support weight aggregation that replaces complex floating-point operations with integer operations to reduce power and bandwidth by 92.2% and 69.1%; 2) line streaming 7-stage pipeline architecture with aggregation pipeline reordering optimization to realize 94% utilization and 43.9% memory reduction; and 3) shifting register-based pipeline buffer optimization to reduce 29.8% area. The proposed depth-estimation processor realizes a real-time 3D HGI with 9.52 ms of latency under QVGA stereo inputs. It achieves external memory bandwidth reduction to 18.93 MB/s with 15.56 mW power and 2.8 mm(2) area, which are 4.1x and 6.9x more efficient than state-of-the-arts [9, 10], respectively.