As new display technologies (i.e. foldable phone and modular display) with variable aspect ratios emerge, content-aware video retargeting has attracted much attention from both academia and industry. The content-aware video retargeting aims to adjust the aspect ratio of a video sequence while preserving both, its content and its spatio-temporal consistency. This is a particularly challenging task since these two properties may drastically differ and contradict depending on the video characteristics. In this paper, we explore this conflict in the context of video retargeting, then we propose an appropriate solution to alleviate this issue using a deep recurrent convolutional neural network architecture. First of all, we present a method to generate multiple ground-truth labels under various aspect ratios. Using this dataset, our network is trained to predict various retargeted video candidates from a single input sequence. The resulting candidates present different properties, some of them with more emphasis on the content preservation while the others focus on the spatio-temporal consistency. Among the generated candidates, the final result which satisfy the best compromise is selected. A large set of qualitative and quantitative experiments shows the ability of our method for the content-aware video retargeting.