Video thumbnails enable users to see quick snapshots of video collections. To display the video thumbnails, the first frame or a frame selected by using simple low level features in each video clip has been set to the default thumbnail for the sake of computational efficiency and implementation simplicity. However, such methods often fail to represent the gist of the clip. To overcome this limitation, we present a new framework for both static and dynamic video thumbnail extraction. First, we formulate energy functions using the features which incorporate mid-level information to obtain superior thumbnailing. Since it is considered that frames whose layouts are similar to others in the clip are relevant in video thumbnail extraction, scene layouts are also considered in computing overall energy. For dynamic thumbnail generation, a time slot is determined by finding the duration showing the minimum energy. Experimental results show that the proposed method achieves comparable performance on a variety of challenging videos, and the subjective evaluation demonstrates the effectiveness of our method.