Videos are often captured with unwanted camera shakes, e.g., from hand-held filming by nonexpert users, resulting in unstable videos that inevitably contain motion blur induced from the camera shakes. However, when such unstable videos are stabilized using existing video stabilization methods, different amounts of motion blur still remain in the predicted stable videos, which looks highly unnatural. Similarly, existing video deblurring methods can effectively remove the motion blur in the captured unstable videos, but the predicted deblurred videos would still be unstable. Thus, video stabilization and deblurring must be performed simultaneously to produce visually pleasing videos. In this paper, I propose a novel end-to-end deep learning-based framework that jointly performs video stabilization and deblurring to generate stabilized and deblurred video outputs from unstable and blurry video inputs. The proposed framework is designed to leverage the mutual dependency of the two tasks, and is trained on synthesized data pairs generated using Perlin noise. Furthermore, the proposed joint model outperforms cascaded state-of-the-art video stabilization and deblurring methods quantitatively and qualitatively.