This paper introduces a novel dynamic neural network model which can recognize dynamic visual image patterns of human actions based on learning. The proposed model is characterized by its capability of extracting the spatio-temporal feature hierarchy latent in the training visual image streams. The model achieves this property by integratingtwo essential ideas: (1) multiple spatial-scales processing and(2) multiple timescales processing, which have been introduced the convolutional neural network (CNN) and the multiple timescale recurrent neural network (MTRNN), respectively. The evaluation of the model performance conducted by utilizing the Weizmann dataset showed that the proposed model outperforms other neural network models in recognition of a set of prototypical human movement patterns. Furthermore, additional evaluation testing for recognition of concatenated sequences of these prototypical movement patterns indicates that the model is endowed with a remarkable capability for contextual recognition of long-range dynamic visual patterns.