Proactive Recovery in a Byzantine-Fault-Tolerant System

Miguel Castro and Barbara Liskov

Laboratory for Computer Science,
Massachusetts Institute of Technology,
545 Technology Square, Cambridge, MA 02139
{castro,liskov}@lcs.mit.edu


Abstract:

This paper describes an asynchronous state-machine replication system that tolerates Byzantine faults, which can be caused by malicious attacks or software errors. Our system is the first to recover Byzantine-faulty replicas proactively and it performs well because it uses symmetric rather than public-key cryptography for authentication. The recovery mechanism allows us to tolerate any number of faults over the lifetime of the system provided fewer than 1/3 of the replicas become faulty within a window of vulnerability that is small under normal conditions. The window may increase under a denial-of-service attack but we can detect and respond to such attacks. The paper presents results of experiments showing that overall performance is good and that even a small window of vulnerability has little impact on service latency.
 


This paper is available in PostScript or PDF:


Contents:




Published in the Proceedings of the Fourth Symposium on Operating Systems Design and Implementation, San Diego, USA, October 2000.

This research was supported by DARPA under contract F30602-98-1-0237 monitored by the Air Force Research Laboratory.