Your Search Results

Use this resource - and many more! - in your textbook!

AcademicPub holds over eight million pieces of educational content for you to mix-and-match your way.

Experience the freedom of customizing your course pack with AcademicPub!
Not an educator but still interested in using this content? No problem! Visit our provider's page to contact the publisher and get permission directly.

An efficient algorithm-based fault detection and recovery on multiprocessor systems

By: Ali, S.A.; Mahdy, Y.B.; Hassan, H.A.;

1999 / IEEE / 0-7803-5682-9

Description

This item was taken from the IEEE Conference ' An efficient algorithm-based fault detection and recovery on multiprocessor systems ' Algorithm-Based Fault Tolerance (ABFT) schemes have been proposed as a means of low-cost error protection for parallel algorithms. This paper presents a modified fault tolerant scheme for matrix multiplication on multiprocessor systems. The proposed scheme increases the detectability through the use of a new partition scheme for the system's processors. The time overhead of the modified recovery algorithm is reduced by the use of a new weight checksum code based only on shifting not multiplication. In this paper a Triple modular Redundancy (TMR) host is used which is actually a part of the multiprocessor system to avoid the need for an expensive host. Thus, the proposed system possess higher reliability at a lower overhead time and cost.