Toward reliable microprocessors in nanometer-scale technologies나노스케일 공정에서의 고신뢰성 마이크로프로세서 설계 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 588
  • Download : 0
For the last four decades, the continued scaling of CMOS process technology is one of the major driving forces of the semiconductor industry. As technology is advanced to the next generation, transistors become smaller, faster, and cheaper. Thus, the advance in the process technology gives opportunities for chip architects to develop high performance microprocessors. However, as microprocessors integrate more transistors, which are smaller and weaker, and become more complex, they are expected to experience more hardware faults. Thus, the cost-effective fault tolerance techniques are required to continue the success of the semiconductor industry. This dissertation proposes the cost-effective techniques to enhance the reliability of microprocessors in nanometer-scale technologies. In a microprocessor, there are a number of components. Among them, this dissertation focuses on the development of the fault-tolerance techniques for execution units and on-chip cache memories which are most frequently used and most vulnerable to hardware faults. In order to detect transient faults in the execution units, a simple fault detection technique, called TECA, is proposed by exploiting frequent small operand values of instructions and frequently used shift operations. The conditions of the applicable instructions for the proposed technique are explored. The applicable instructions are protected by duplicating operands in ALU directly while other instructions are protected using time redundancy. To tolerate permanent faults in the arithmetic/logic unit, a novel fault detection, diagnosis, and isolation technique, called LIZARD, are proposed. In the proposed technique, two half-word ALUs are employed instead of a single full-word ALU, to perform computations with concurrent fault detection. When a fault is detected, the two ALUs are partitioned into four quarter-word ALUs. After diagnosing and isolating a faulty quarter-word ALU, LIZARD continues its operation using the remaining ones, which can detect and isolate another fault. Even though LIZARD uses narrow ALUs for computations, it adds negligible performance overhead through exploiting predictability of the results in the arithmetic computations. In addition, the architectural modifications, required when employing LIZARD for scalar as well as superscalar processors, are presented. This dissertation also addresses process variation-induced permanent faults in the on-chip caches. The process variations cause large fluctuations in the access times of SRAM cells. Caches made of those SRAM cells cannot be accessed within the target clock cycle time, which reduces yield of processors. To combat these access time failures in caches, many schemes have been proposed, which are, however, limited in their coverage and do not scale well at high failure rates. In this dissertation, a new L1 (first level) cache architecture employing multi-cycle cell access and subarray-level parallel access is proposed. Multi-cycle cell access eliminates all access time failures in L1 caches. Subarray-level parallel access minimizes the performance impact of the multi-cycle cell access. For further performance improvement, architectural techniques are proposed. Finally, a simple-yet efficient technique is proposed to enhance the reliability of multi-level cell STT-RAM based on-chip cache memories. STT-RAM (Spin-transfer torque random access memory) is an emerging non-volatile memory technology that provides fast access time and low standby power with small feature size. Recently, MLC (multi-level cell) STT-RAM is proposed to enhance the data density of STT-RAM. However, the read stability and writability of MLC STT-RAM can be significantly reduced at nanometer-scale technology nodes due to process variations and random thermal fluctuations. To enhance the reliability of read operations of MLC STT-RAM, three-valued MLC STT-RAM is proposed. By reducing the data representation levels of a MLC cell, its read stability is significantly enhanced. In additions, to enhance writability, a reliable write mechanism is proposed. In this mechanism, a write operation is performed with a high current and terminated as soon as the data is written. Altogether, the hardware-fault tolerance techniques introduced in this dissertation enhance the reliability of the microprocessors with low costs. These cost-effective fault-tolerance techniques make possible to develop reliable microprocessors and increase their yield at unreliable process technology nodes. Since it is expected that the process technology becomes more unreliable, this is a key requirement to continue the advance in the process technology and microprocessors.
Advisors
Kim, Soontaeresearcher김순태researcher
Description
한국과학기술원 :전산학과,
Publisher
한국과학기술원
Issue Date
2015
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전산학과, 2015.2 ,[xi, 135 p. :]

Keywords

Fault tolerance; Reliability; Microprocessor; Cache; ALU; 고신뢰성; 마이크로프로세서; 캐시 메모리; 산술논리연산기; 내고장성

URI
http://hdl.handle.net/10203/222392
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=657599&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0