A topic that often poses more questions than provides answers, is the prediction of software failure rates. In our blog post Don’t be Sheepish about RAMS we introduced the 4 terms reliability, availability, maintainability and safety. The first 3 are very much focussed on the area of reliability engineering. Safety is however a subject that is to a certain extent contradictory to the concepts of reliability. This though will not always be the case, e.g. in battery management and corresponding systems. This is an area where sustained reliable operation goes hand in hand with safety.
We have written blogs previously on the demands of safety standards in both the automotive and medical device domains The continuum of safety-related delivery however, neither ISO 26262 nor IEC 62304 give any real guidance on software reliability or failure analysis. The drive in such standards tends to focus on fail-safe operation and hence avoiding a hazardous situation rather than the continued operation of the functionality. That said the automotive industry has had more focus on fail-operational systems for some years now.
Reliability is predominately a focus for systems engineering, but as software plays a pivotal role in most systems it is also a key topic in reliability activities. Hardware is focussed on the failure in time rates (FIT) of components, the inverse of the total FIT yields the reliability quantity Mean Time to Failure (MTTF). Software failures are however systematic by their very nature and hence random failures used in hardware are not applicable in software analysis.
Software Reliability Growth Models
There are numerous methods defined for software reliability based on software reliability growth models (SRGM). One of these is Neufelder (defined in IEEE Standard 1633-2016). Let´s take a look at how this model is used to assess software reliability:
The model is based around 155 questions and has 7 outcomes ranging from ‘world class’ to ‘seriously distressed’. The latter conjuring up a rather unusual image, probably more the developer rather than the code!
There are three parts to the software reliability prediction:
- effective size
- fault density
- reliability growth
The Neufelder outcomes then being defect density, failure rate and MTTF. Table 1 indicates the parameters utilised.
The factors and mathematics defined in Neufelder produce a score ranging from 0.0219 to 2.402 which then defines the fault density category – 2.402 representing ‘seriously distressed’.
For system reliability evaluations, purely using hardware analysis is a limiting factor, the quality of the software will play its part in the calculation. Using a SRGM such as Neufelder can help in this area, particularly if a system is located in a remote corner of the world and it is difficult/expensive for service personnel to reach.
How practical such a model as Neufelder would be for medical device or automotive software development is another story. Many medical devices companies use a table of severity versus probability of failure. The probability column is often a gut feeling rather than anything calculated e.g. from a SRGM. The class decomposition in IEC 62304 proceeds on the basis of software failure being 100%, but the software failure rate could be quantified over time using a SRGM.
The automotive sector has a more pragmatic approach, generally basing analysis of software on guide words using a process akin to Hazard and Operability Studies (HAZOPS). Ultimately, an alternative view is that a measure of quality and reliability of software results from good requirements management and static/dynamic code requirement fulfilment.
There is a simplified variant of Neufelder called Shortcut, this could be a more practical approach. But ultimately the desire to use a SRGM would come from the desire for longer device operating times.
By Alastair Walker, Consultant & Owner