The current paper proposes a novel model for integrative learning of proactive visual attention and sensory-motor control as inspired by the premotor theory of visual attention. The model is characterized by coupling a slow dynamics network with a fast dynamics network and by inheriting our prior proposed multiple timescales recurrent neural networks model (MTRNN) that may correspond to the fronto-parietal networks in the cortical brains. The neuro-robotics experiments in a task of manipulating multiple objects utilizing the proposed model demonstrated that some degrees of generalization in terms of position and object size variation can be achieved by organizing seamless integration of the proactive object-related visual attention and the related sensory-motor control into a set of action primitives in the distributed neural activities appearing in the fast dynamics network. It was also shown that such action primitives can be combined in compositional ways in acquiring novel actions in the slow dynamics network. The experimental results presented substantiate the premotor theory of visual attention.