Previous research methods on wake-up word detection (WWD) have been proposed with focus on finding a decent word representation that can well express the characteristics of a word. However, there are various obstacles such as noise and reverberation which make it difficult in real-world environments where WWD works. To tackle this, we propose a novel architecture called interlayer selective attention network (ISAN) which generates more robust word representation by introducing the concept of selective attention. Experiments in real-world scenarios demonstrated that the proposed ISAN outperformed several baseline methods as well as other attention methods. In addition, the effectiveness of ISAN was analyzed with visualizations.