Facial expression understanding is one of the basic universal constructions of nonverbal inter-human communication. The ability to classify facial expressions is crucial for better machine-human interaction. In this thesis, we study emotion classification problem using Capsule Network architecture, which is known for ability to generalize learned characteristics of various datasets. To the best of our knowledge, this is a first approach to learn emotional variance encoding of human face using deep neural networks. The proposed model has facial keypoint detection unit, which encourages emotion classifier to learn critical facial attributes. Using the proposed method, we were able to disentangle universal human expressions and we showed that the neural network could learn several expression action units without any supervision.