Currently, the performance of text emotion recognition is superior to speech emotion recognition in general. This gap in performance is attributed to the fact that text can provide linguistic context which plays an important role in classifying emotion. A person crying might be classified as sad. However, if we consider the linguistic context and situation behind it, the person might be crying tears of joy. Motivated by this, to improve the performance for speech emotion recognition, we are leveraging this linguistic context from past utterances for speech emotion recognition through the help of Automatic Speech Recognition (ASR) system and language model. We also utilize prosody features like pitch and energy of the speech which are not present in text modality to complement the linguistic features and boost the performance further. Implementation of this method shows that we achieve 6.9% higher weighted accuracy than the current State of The Art model