A video summary abstracts the entirety with the gist without losing the essential content of the original video and also facilitates efficient content-based access to the desired content. In this article, we propose a novel method for summarizing a news video based on multimodal analysis of the content. The proposed method exploits the closed caption (CC) data to locate semantically meaningful highlights in a news video and speech signals in an audio stream to align the CC data with the video in a time-line. Then, the extracted highlights are described in a multilevel structure using the MPEG-7 Summarization Description Scheme (DS). Specifically, we use the Hierarchical Summary DS that allows efficient accessing of the content through such functionalities as multilevel abstracts and navigation guidance in a hierarchical fashion. Intensive experiments with our prototypical systems are presented to demonstrate the validity and reliability of the proposed method in real applications. (C) 2004 Wiley Periodicals, Inc.