Abstract
Experience suggests that it is edifying to talk about software crises at NATO workshops. It is argued in this position paper that proper engineering of fault tolerance software has not been getting the attention it deserves. The paper outlines the difficulties in building fault tolerant systems and describes the challenges software fault tolerance is facing. The solution being advocated is to place a special emphasis on fault tolerance software engineering which would provide a set of methods, techniques, models and tools that would exactly fit application domains, fault assumptions and system requirements and support disciplined and rigorous fault tolerance throughout all phases of the life cycle. The paper finishes with an outline of some directions of work requiring special focused efforts from the R&D community.
- F. Cristian. Exception handling. In Dependability of Resilient Computers, T. Anderson (Ed.). Blackwell Scientific Publications, 1989. pp. 68--97.Google Scholar
- Interim Report: Causes of the August 14th Blackout in the United States and Canada. Canada-U.S. Power System Outage Task Force. November 2003. http://www.nrcanrncan.gc.ca/media/docs/reports_e.htm.Google Scholar
- T. Hoare. Assertions in modern software engineering practice. Invited talk. COMPSAC 2002. Oxford, UK, 26--29 August 2002. Google ScholarDigital Library
- M. Bruntink, A. van Deursen, T. Tourwé. Discovering Faults in Idiom-Based Exception Handling. ICSE 2006. 20--28 May 2006. Shanghai. China. ACM Press. pp. 242--251. Google ScholarDigital Library
- P. Sacramento, B. Cabral, P. Marques. Unchecked exceptions: can the programmer be trusted to document exceptions? Accepted for the 2nd Int. Conf. on Innovative Views of .NET Technologies (IVNET 2006). 2006. Florianopolis, Brazil.Google Scholar
- D. Reimer, H. Srinivasan. Analyzing exception usage in large java applications. In Proceedings of ECOOP 2003 Workshop on Exception Handling in Object-Oriented Systems, July 2003.Google Scholar
- J.-C. Laprie. Dependability of software-based critical systems. In Dependable Network Computing. D. R. Avresky (Ed.). 1999.Google Scholar
- J. Knight. Assured Reconfiguration: An Architectural Core For System Dependability. Invited talk. ICSE 2005 Workshop on Architecting Dependable Systems. St. Louis, Missouri, USA, 17 May 2005.Google Scholar
- J. Johnson. The Other Side of Failure! DSN 2006 Industry Session. June 26. Philadelphia, USA. 2006.Google Scholar
- A. Avizienis. Infrastructure-Based Design of Fault-Tolerant Systems. In the Electronic Proceedings of the IFIP Int. Workshop on Dependable Computing and Its Applications (DCIA 98) January 12-14, 1998, Johannesburg, South Africa.Google Scholar
- T. Anderson, B. Randell, A. Romanovsky. Wrapping the future. In the Proceedings of the IFIP Congress Topical Sessions. Toulouse. France. 2004. pp. 165--174.Google ScholarCross Ref
- R. de Lemos, C. Gacek, A. Romanovsky. Architectural Mismatch Tolerance. In Architecting Dependable Systems. LNCS 2677, 2003. pp. 175--194.Google ScholarCross Ref
- M. Butler, C. Jones, A. Romanovsky, E. Troubitsyna (Eds). Rigorous development of complex fault tolerant system. LNCS 4157. 2006. Google ScholarDigital Library
- R. de Lemos, C. Gacek, A. Romanovsky (Eds). Architecting Dependable Systems. LNCS 2677. 2003.Google Scholar
- R. de Lemos, C. Gacek, A. Romanovsky (Eds). Architecting Dependable Systems II. LNCS 3069, 2004. Google ScholarDigital Library
- R. de Lemos, C. Gacek, A. Romanovsky (Eds). Architecting Dependable Systems III. LNCS 3549, 2005. Google ScholarDigital Library
Index Terms
- A looming fault tolerance software crisis?
Recommendations
On Fault Representativeness of Software Fault Injection
The injection of software faults in software components to assess the impact of these faults on other components or on the system as a whole, allowing the evaluation of fault tolerance, is relatively new compared to decades of research on hardware fault ...
Fault Tolerance in Multiprocessor Systems Without Dedicated Redundancy
An algorithm called RAFT (recursive algorithm for fault tolerance) for achieving fault tolerance in multiprocessor systems is described. Through the use of a combination of dynamic space- and time- redundancy techniques, RAFT achieves fault tolerance in ...
Error injection aimed at fault removal in fault tolerance mechanisms-criteria for error selection using field data on software faults
ISSRE '96: Proceedings of the The Seventh International Symposium on Software Reliability EngineeringFault injection allows a detailed study of complex interactions between faults and fault handling mechanisms. It can be a useful complement to analytical modeling and formal verification techniques in the testing of fault tolerant systems. However, work ...
Comments