Automatic face indexing is an important technique for realizing actor-based video services in an IPTV environment. This paper proposes a novel face indexing system that takes advantages of the internet connection of an STB to construct a FR engine that is equipped with a high number of training face images. In addition, we use a face clustering technique to obtain multiple face images of the same subject from a sequence of video frames. The clustered face images are combined using a weighted feature fusion scheme, resulting in a considerable enhancement in face indexing accuracy. The effectiveness of the proposed system is validated using more than 300,000 video frames, collected from five video clips containing drama or movie content. The experimental results show that the proposed system can achieve a face annotation accuracy that is feasible for practical applications.