it receives segment features and outputs the coefficient for P, I, and D terms, which are then used to extract a mask bias value. Experimental results show that the proposed MB-OPC with RL is about 2.8 times faster with 24% smaller maximum EPE, compared to standard MB-OPC.; Model-based optical proximity correction (MB-OPC) consists of fragmentation, which decomposes each polygon into a number of line segments, followed by iterative segment correction and lithography simulation. A core of MB-OPC is a PID controller, which determines mask bias value through a feedback loop. The coefficients for P, I, and D terms are usually determined in heuristic fashion, and do not change over segments and over iterations. We apply reinforcement learning (RL) to adaptively adjust such coefficients, in an effort to better determine mask bias value and to converge into the final OPC solution quickly. RL is designed to define a state, action, and reward