This paper considers the situation where data-based optimization is to be performed but data sampling is limited due to high cost and time. Such situations demand highly efficient data-sampling and utilization and Bayesian optimization (BO) is the most commonly used method as it allows users to balance between exploration and exploitation in deciding where to sample next in the design space. However, the standard acquisition functions used in Bayesian optimization such as the expected improvement have been criticized for being greedy and myopic in many situations. To address the limitation of the standard acquisition functions of BO due to its near-sighted nature, this paper suggests a novel reinforcement learning based method which enables multi-step lookahead Bayesian optimization. Several benchmark functions are tested to compare the performance of the RL based method against the traditional BO methods using expected improvement and its rollout-based extensions. The proposed method outperformed popular Bayesian optimization methods in the case study.