Offline reinforcement learning (RL) seeks to learn policies from previously collected datasets alone. This requires offline RL methods to address the distribution shift between the data collection policy underlying the dataset and the learned policy. Many offline RL methods often regularize the policy or value function during training to discourage the choice of out-of-distribution actions. Despite these efforts, the learned policies often suffer from state distribution shift during deployment. Since there is no direct learning signal for out-of-distribution states, this shift can lead to generalization problems. In this paper, we propose a post-hoc policy adjustment method
for deployment phase to enhance the policy. Specifically, we focus on offline actor-critic methods employing conservatism, such as conservative Q-learning (CQL). The main concept originates from two key observations: first, for out-of-distribution states, the actor might not be optimized sufficiently regarding the critic, and second, the conservatively trained critic can aid in locating a nearby in-distribution state. We test our method using the D4RL benchmark and show that it can notably improve the performance of current state-of-the-art offline actor-critic methods.