Log-based rollback recovery without checkpoints of shared memory in software DSM

A common approach to fault-tolerant software DSM is to take checkpoints with message logging. Our remote logging has low overhead because each node saves the coherence-related data into the memory of a remote node through a high-speed system area network. For more lightweight fault-tolerant DSM, in this paper, we mainly focused on eliminating shared memory checkpointing during failure-free execution. Each node independently takes the checkpoints of execution states and non-shared data only. When a node fails, it regenerates its pages from the remote copies in live nodes. In order to efficiently reconstruct pages, we also introduced a XOR-diffing technique. The diff logs, which have been created by XOR operations during failure-free execution, can be applicable to any version of remote copies either backward or forward for recovery. Our scheme reduces the checkpointing overhead and also alleviates the imbalance in execution times among nodes due to independent checkpointing.
Publisher
SPRINGER
Issue Date
2006-02
Language
ENG
Citation

JOURNAL OF SUPERCOMPUTING, v.35, no.2, pp.141 - 154

ISSN
0920-8542
DOI
10.1007/s11227-006-1667-7
URI
http://hdl.handle.net/10203/4684
Appears in Collection
CS-Journal Papers(저널논문)
  • Hit : 508
  • Download : 1
  • Cited 0 times in thomson ci
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡClick to seewebofscience_button
⊙ Cited 2 items in WoSClick to see citing articles inrecords_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0