Semantic segmentation and depth estimation lie at the heart of scene understanding and play crucial roles especially for autonomous driving. In particular, it is desirable for an intelligent self-driving agent to discern unexpected obstacles on the road ahead reliably in real-time. While existing semantic segmentation studies for small road hazard detection have incorporated fusion of multiple modalities, they require additional sensor inputs and are often limited by a heavyweight network for real-time processing. In this light, we propose an end-to-end Real-time Obstacle Detection via Simultaneous refinement, coined RODSNet (https://github.com/SAMMiCA/RODSNet) which jointly learns semantic segmentation and disparity maps from a stereo RGB pair and refines them simultaneously in a single module. RODSNet exploits two efficient single-task network architectures and a simple refinement module in a multi-task learning scheme to recognize unexpected small obstacles on the road. We validate our method by fusing Cityscapes and Lost and Found datasets and show that our method outperforms previous approaches on the obstacle detection task, even recognizing the unannotated obstacles at 14.5 FPS on our fused dataset (2048x 1024 resolution) using RODSNet-2x. In addition, extensive ablation studies demonstrate that our simultaneous refinement effectively facilitates contextual learning between semantic and depth information.