skip to main content
article

A looming fault tolerance software crisis?

Published:01 March 2007Publication History
Skip Abstract Section

Abstract

Experience suggests that it is edifying to talk about software crises at NATO workshops. It is argued in this position paper that proper engineering of fault tolerance software has not been getting the attention it deserves. The paper outlines the difficulties in building fault tolerant systems and describes the challenges software fault tolerance is facing. The solution being advocated is to place a special emphasis on fault tolerance software engineering which would provide a set of methods, techniques, models and tools that would exactly fit application domains, fault assumptions and system requirements and support disciplined and rigorous fault tolerance throughout all phases of the life cycle. The paper finishes with an outline of some directions of work requiring special focused efforts from the R&D community.

References

  1. F. Cristian. Exception handling. In Dependability of Resilient Computers, T. Anderson (Ed.). Blackwell Scientific Publications, 1989. pp. 68--97.Google ScholarGoogle Scholar
  2. Interim Report: Causes of the August 14th Blackout in the United States and Canada. Canada-U.S. Power System Outage Task Force. November 2003. http://www.nrcanrncan.gc.ca/media/docs/reports_e.htm.Google ScholarGoogle Scholar
  3. T. Hoare. Assertions in modern software engineering practice. Invited talk. COMPSAC 2002. Oxford, UK, 26--29 August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bruntink, A. van Deursen, T. Tourwé. Discovering Faults in Idiom-Based Exception Handling. ICSE 2006. 20--28 May 2006. Shanghai. China. ACM Press. pp. 242--251. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Sacramento, B. Cabral, P. Marques. Unchecked exceptions: can the programmer be trusted to document exceptions? Accepted for the 2nd Int. Conf. on Innovative Views of .NET Technologies (IVNET 2006). 2006. Florianopolis, Brazil.Google ScholarGoogle Scholar
  6. D. Reimer, H. Srinivasan. Analyzing exception usage in large java applications. In Proceedings of ECOOP 2003 Workshop on Exception Handling in Object-Oriented Systems, July 2003.Google ScholarGoogle Scholar
  7. J.-C. Laprie. Dependability of software-based critical systems. In Dependable Network Computing. D. R. Avresky (Ed.). 1999.Google ScholarGoogle Scholar
  8. J. Knight. Assured Reconfiguration: An Architectural Core For System Dependability. Invited talk. ICSE 2005 Workshop on Architecting Dependable Systems. St. Louis, Missouri, USA, 17 May 2005.Google ScholarGoogle Scholar
  9. J. Johnson. The Other Side of Failure! DSN 2006 Industry Session. June 26. Philadelphia, USA. 2006.Google ScholarGoogle Scholar
  10. A. Avizienis. Infrastructure-Based Design of Fault-Tolerant Systems. In the Electronic Proceedings of the IFIP Int. Workshop on Dependable Computing and Its Applications (DCIA 98) January 12-14, 1998, Johannesburg, South Africa.Google ScholarGoogle Scholar
  11. T. Anderson, B. Randell, A. Romanovsky. Wrapping the future. In the Proceedings of the IFIP Congress Topical Sessions. Toulouse. France. 2004. pp. 165--174.Google ScholarGoogle ScholarCross RefCross Ref
  12. R. de Lemos, C. Gacek, A. Romanovsky. Architectural Mismatch Tolerance. In Architecting Dependable Systems. LNCS 2677, 2003. pp. 175--194.Google ScholarGoogle ScholarCross RefCross Ref
  13. M. Butler, C. Jones, A. Romanovsky, E. Troubitsyna (Eds). Rigorous development of complex fault tolerant system. LNCS 4157. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. de Lemos, C. Gacek, A. Romanovsky (Eds). Architecting Dependable Systems. LNCS 2677. 2003.Google ScholarGoogle Scholar
  15. R. de Lemos, C. Gacek, A. Romanovsky (Eds). Architecting Dependable Systems II. LNCS 3069, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. de Lemos, C. Gacek, A. Romanovsky (Eds). Architecting Dependable Systems III. LNCS 3549, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A looming fault tolerance software crisis?

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGSOFT Software Engineering Notes
          ACM SIGSOFT Software Engineering Notes  Volume 32, Issue 2
          March 2007
          118 pages
          ISSN:0163-5948
          DOI:10.1145/1234741
          Issue’s Table of Contents

          Copyright © 2007 Author

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 March 2007

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader