Use this resource - and many more! - in your textbook!
AcademicPub holds over eight million pieces of educational content for you to mix-and-match your way.
An efficient algorithm-based fault detection and recovery on multiprocessor systems
By: Ali, S.A.; Mahdy, Y.B.; Hassan, H.A.;
1999 / IEEE / 0-7803-5682-9
This item was taken from the IEEE Conference ' An efficient algorithm-based fault detection and recovery on multiprocessor systems ' Algorithm-Based Fault Tolerance (ABFT) schemes have been proposed as a means of low-cost error protection for parallel algorithms. This paper presents a modified fault tolerant scheme for matrix multiplication on multiprocessor systems. The proposed scheme increases the detectability through the use of a new partition scheme for the system's processors. The time overhead of the modified recovery algorithm is reduced by the use of a new weight checksum code based only on shifting not multiplication. In this paper a Triple modular Redundancy (TMR) host is used which is actually a part of the multiprocessor system to avoid the need for an expensive host. Thus, the proposed system possess higher reliability at a lower overhead time and cost.
Efficient Algorithm-based Fault Detection
Algorithm-based Fault Tolerance
Low-cost Error Protection
Modified Fault Tolerant Scheme
Modified Recovery Algorithm
Triple Modular Redundancy
Fault Tolerant Systems
Electrical Fault Detection
Fault Tolerant Computing