Many recent studies have proposed methods for the classification of dynamic textures (DT). A method involving local binary patterns on three orthogonal planes (LBP-TOP) has shown promising results and generated considerable interest. However, LBP-TOP and most of its variants suffer from drawbacks caused by the accumulation process in the TOP technique. This process uses features from all frames in the DT sequence, including irrelevant frames, and thus disregards the distinct characteristics of each frame. To overcome this problem, we propose a codebook-based DT descriptor that aggregates salient features on three orthogonal planes. Given a DT sequence, only those frame features that are highly correlated with each cluster are selected and aggregated from the perspective of visual words. The proposed DT descriptor removes the feature from outlier frames that suddenly or rarely appear in a particular context, thus enhancing the emphasis of the salient features. Experimental results using public DT and dynamic scene datasets demonstrate the superiority of the proposed method over comparative approaches. The proposed method also yields outstanding results compared to the state-of-the-art DT representation.