Recent state-of-the-art systems for human action recognition are computationally intensive due to the use of full-length videos and complex network structures. This study aims to develop sampling strategies as well as simple network structure to boost inference time of recognition. Especially, auto-correlation sequence, which shows the similarity between a video and a lagged version of itself, is adopted to extract the most essential segment of the video without information loss. The proposed method considerably reduces inference time while keeping comparable recognition accuracy.