Digital circuits are expected to increasingly suffer from more hard faults due to technology scaling. Especially, a single hard fault in the ALU might lead to a total failure in the embedded systems. In addition, energy efficiency is critical in these systems. To address these increasingly important problems in the ALU, we propose a novel energy-efficient fault-tolerant ALU design called Lizard. Lizard utilizes two 16-bit ALUs to perform 32-bit computations with fault detection and diagnosis. By exploiting predictable operations, fault detection is performed in a single cycle. The 16-bit ALUs can be partitioned into two 8-bit ALUs. When a fault occurs in one of the four 8-bit ALUs, Lizard diagnoses and isolates a faulty 8-bit ALU for itself. After the faulty 8-bit ALU is isolated, Lizard continues its operation using the remaining three 8-bit ALUs, which can detect and isolate another fault. In this way, Lizard can survive faults on at most two sub-ALUs increasing its lifetime and fault tolerance. We conducted comparative evaluations with an unprotected ALU, triple modular redundancy ALU, and quadruple time redundancy ALU in terms of area, energy consumption, performance, and reliability. It is demonstrated that Lizard outperforms other ALU designs in most cases, especially in energy efficiency.
28th IEEE International Conference on Computer Design, ICCD 2010, pp.342 - 349