A discrete event model is utilized to improve the reliability of a discrete event dynamical system (DEDS). We propose a systematic way to analyse DEDSs to classify faults and failures quantitatively, and to find tolerable fault event sequences (TFESs) embedded in the system. An automated failure diagnosis scheme with respect to the nominal normal operating event sequences, and the supervisory control for TFESs, are presented. Moreover, the supervisor failure diagnosis with respect to the TFESs is considered. Finally, an analytical framework for fault-tolerant supervisory control systems is proposed. A case study of a thermal oxidation system is described.