Fault tolerance in computing
Prof. Lorenzo Strigini
Centre for Software Reliability, City University London
16 hours, 4 credits
October 5 - October 8, 2010
Dipartimento di Ingegneria dell'Informazione: Elettronica, Informatica, Telecomunicazioni, Largo Lucio Lazzarino, meeting room
Contacts: Prof. Cinzia Bernardeschi
Aims
Fault tolerance, that is, clever use of redundancy, is one of the organising principles for achieving dependability and resilience in all systems. Fault tolerance techniques are well established in some areas of computing, and many off-the-shelf building blocks routinely include some fault tolerance mechanism. Yet, the philosophy of fault tolerance and the knowledge of its design patterns and tricks are not widespread among those who could take advantage of it, especially in the design of applications and of complex hardware-software-human systems. While specific technical communities (e.g., in various safety-critical applications of embedded computers) have consolidated techniques and practices for redundant design, diversity and so on, attempts to improve these practices or to apply the same principles outside these specialised communities often lead to controversy (e.g., in the security community) arising from a lack of a common language to deal with the basic issues in fault tolerance.
These lectures aim to: