Recent evidence in neuroscience and psychology suggests that a single reinforcement learning (RL) algorithm only accounts for less than 60% of the variance of human choice behavior in an uncertain and dynamic environment, where the amount of uncertainty in state-action-state transitions drift over time. The prediction performance further decreases when the size of the state space increases. We proposed a hierarchical context-dependent RL control framework that dynamically exerted control weights on model-based (MB) and multiple model-free (MF) RL strategies associated with different task goals. To properly assess the validity of the proposed method, we considered a two-stage Markov decision task (MDT) in which the three different types of context changed over time. We trained 57 different RL control models on a Caltech MDT data set; then, we assessed their prediction performance using a Bayesian model comparison. This large-scale computer simulation analysis revealed that the model providing the most accurate prediction was the version that implemented the competition between the MB and multiple goal-dependent MF RL strategies. The present study demonstrates the applicability of the goal-driven RL control to a variety of real-world human-robot interaction scenarios.