Pairs trading is a statistical arbitrage strategy that utilizes the mean reversion analysis of the spread representing the price difference between two highly correlated stocks to obtain a potential market neutral profit. Recently, the profitability of traditional pair trading strategies is decreasing due to the fragile correlation relationship and the declining arbitrage opportunities, and many investors are trying to optimize the pair trading strategy. In this study, we propose a two-step pairs trading strategy optimization model. In the first step, we use deep learning to build a model that predicts the movement of the spread. In the second step, the pair trading strategy is optimized through Proximal Policy Optimization, a reinforcement learning algorithm, using predictive information on spread movement. Reinforcement learning agent is trained to select the optimum level of trading and stop-loss thresholds that maximize cumulative returns during the training period. Pairs are selected from stocks on the KOSPI 100 Index by conducting a cointegration test. As a result, we confirm that our proposed model shows superior performance in profitability and risk management, and outperforms traditional pairs trading strategy.