Disclosed is a method for providing a content, the method including extracting at least one still image from video included in the content, extracting audio, which corresponds to the still image, and generating a script corresponding to the audio, adding a caption to the still image based on the generated script, and providing the content in response to a request of consumption to the content and providing the caption-added still image for the video that is streaming in real time.