Among the challenges in the recent research of end-to-end (E2E) driving, interpretability and distribution shift in the simulation-to-real (Sim2Real) have drawn considerable attention. Because of low interpretability, we cannot clearly explain the causal relationship between the input image and the control actions by the network. Moreover, the distribution shift problem in Sim2Real degrades the driving performance of the policy in the realworld deployment. In this paper, we propose a segmentation-based classwise disentangled latent encoding algorithm to cope with the two challenges. In the proposed algorithm, multi-class segmentation transfers RGB images in both simulation and real environments to the same domain, while preserving the necessary information of objects of primary classes, such as pedestrian, road, and cars, for driving decisions. Besides, in the class-wise disentangled latent encoding, segmented images are encoded to a latent vector, which improves the interpretability significantly, since the state input has a structured format. The interpretability improvement is testified by the t-stochastic neighbor embedding, image reconstruction and the causal relationship between the real images and the control actions. We deploy the driving policy trained in the simulation directly to an autonomous vehicle platform and show, to the best of our knowledge, the first demonstration of the RL-based E2E autonomous in various real environments.