In this paper, to build an autonomous robot, we propose a novel scheme for a goal-oriented behavior Sequence generation in tasks involving multiple objects. The scheme includes three major functions; (1) visual attention for target object localization; (2) automatic initial state correction based on experience using simple reinforcement learning, and (3) a suitable behavior sequence generation method based on multiple timescales recurrent neural networks (MTRNN). The proposed scheme systematically combines the three different major functions so that the autonomous bi-pad robot can automatically execute tasks involving multiple objects based on high level semantic commands given by human supervisor. The selective attention model continuously catches the visual environment to understand the current states of robot and perceive the relationship between current states of robot and the environment (depth perception and localization of a target object). If the current state is different from the initial state (depth perception and localization of a target object), the robot automatically adjust its current state to the initial state by integrating visual attention and simple reinforcement learning. After correcting the initial state of the robot, the behavior sequence generation functions can successfully generate suitable behavior timing signals, by integrating visual attention and MTRNN, based on the high level semantic commands given by human supervisor. Experimental results show that the proposed scheme can successfully generate suitable behavior timing, for a robot to autonomously achieve the tasks involving multiple objects, such as searching, approaching and hitting the target object using its arm. (C) 2013 Elsevier B.V. All rights reserved.