The lip-reading recognition is reported with lip-motion features extracted from multiple video frames by three unsupervised learning algorithms, i.e., Principle Component Analysis (PCA), Independent Component Analysis (ICA), and Non-negative Matrix Factorization (NMF). Since the human perception of facial motion goes through two different pathways, i.e., the lateral fusifom gyrus for the invariant aspects and the superior temporal sulcus for the changeable aspects of faces, we extracted the dynamic video features from multiple consecutive frames for the latter. The multiple-frame features require less number of coefficients for the same frame length than the single-frame static features. The ICA-based features are most sparse, while the corresponding coefficients for the video representation are the least sparse. PCA-based features have the opposite characteristics, while the characteristics of the NMF-based features are in the middle. Also the ICA-based features result in much better recognition performance than the others.