A 99.4 fps optical flow estimation (OFE) processor with image tiling is proposed for action recognition in mobile devices. The OFE is essential for the high action recognition accuracy. However, it is unsuitable for real-time constraint in a mobile computing environment because it requires a huge amount of external memory accesses (EMAs) and matrix computations. For mitigating the external memory bandwidth requirement, this paper proposes the tile-based hierarchical OFE. It divides input images into several tiles and enables intermediate data reusing with 326.4 KB on-chip memory and 175.8 MB/s external memory bandwidth. Moreover, a background decision unit with early termination is proposed to reduce computation workload. It gets rid of unnecessary matrix computation by terminates the computation early for zero optical flow region. As a result, the proposed features reduce external memory bandwidth by 99.3 % and increase throughput by 50.7 %, respectively. The proposed $12.8 mm^2$ OFE processor is implemented in 65 nm CMOS technology, and it achieves the real-time OFE with 99.4 frames-per-second (fps) throughput for an image resolution of QVGA (320 × 240).