< Previous2-6 Safety Management Manual (SMM) 2.3.6 It is important to recognize that some of the defences, or breaches, can be influenced by an interfacing organization. It is therefore vitally important that service providers assess and manage these interfaces. 2.3.7 Swiss Cheese applications for safety management 2.3.7.1 The “Swiss Cheese” Model can be used as an analysis guide by both States and service providers by looking past the individuals involved in an incident or identified hazard, into the organizational circumstances which may have allowed the situation to manifest. It can be applied during SRM, safety surveillance, internal auditing, change management and safety investigation. In each case, the model can be used to consider which of the organization’s defences are effective, which can or have been breached, and where the system could benefit from additional defences. Once identified, any weaknesses in the defences can be reinforced against future accidents and incidents. 2.3.7.2 In practice, the event will breach the defences in the direction of the arrow (hazards to losses) as displayed in the rendering of Figure 3. The assessments of the situation will be conducted in the opposite direction, in this case losses to hazard. Actual aviation accidents will usually include a degree of additional complexity. There are more sophisticated models which can help States and service providers to understand how and why accidents happen. 2.3.8 Practical drift 2.3.8.1 Scott A. Snook's theory of practical drift is used to understand how performance of any system “drifts away” from its original design. Tasks, procedures, and equipment are often initially designed and planned in a theoretical environment, under ideal conditions, with an implicit assumption that nearly everything can be predicted and controlled, and where everything functions as expected. This is usually based on three fundamental assumptions that the: a) technology needed to achieve the system production goals is available; b) personnel are trained, competent and motivated to properly operate the technology as intended; and c) policy and procedures will dictate system and human behaviour. These assumptions underlie the baseline (or ideal) system performance, which can be graphically presented as a straight line from the start of operational deployment as shown in Figure 4. Figure 4. Concept of practical drift Chapter 2. Safety Management Fundamentals 2-7 2.3.8.2 Once operationally deployed, the system should ideally perform as designed, following baseline performance (orange line) most of the time. In reality, the operational performance often differs from the assumed baseline performance as a consequence of real-life operations in a complex, ever-changing and usually demanding environment (red line). Since the drift is a consequence of daily practice, it is referred to as a “practical drift”. The term “drift” is used in this context as the gradual departure from an intended course due to external influences. 2.3.8.3 Snook contests that practical drift is inevitable in any system, no matter how careful and well thought out its design. Some of the reasons for the practical drift include: a) technology that does not operate as predicted; b) procedures that cannot be executed as planned under certain operational conditions; c) changes to the system, including the additional components; d) interactions with other systems; e) safety culture; f) adequacy (or inadequacy) of resources (e.g. support equipment); g) learning from successes and failures to improve the operations, and so forth. 2.3.8.4 In reality people will generally make the system work on a daily basis despite the system’s shortcomings, applying local adaptations (or workarounds) and personal strategies. These workarounds may bypass the protection of existing safety risk controls and defences. 2.3.8.5 Safety assurance activities such as audits, observations and monitoring of SPIs can help to expose activities that are “practically drifting”. Analysing the safety information to find out why the drift is happening helps to mitigate the safety risks. The closer to the beginning of the operational deployment that practical drift is identified, the easier it is for the organization to intervene. More information on safety assurance for States and service providers may be found in Chapters 8 and 9, respectively. 2.4 MANAGEMENT DILEMMA 2.4.1 In any organization engaged in the delivery of services, production/profitability and safety risks are linked. An organization must maintain profitability to stay in business by balancing output with acceptable safety risks (and the costs involved in implementing safety risk controls). Typical safety risk controls include technology, training, processes and procedures. For the State, the safety risk controls are similar, i.e. training of personnel, the appropriate use of technology, effective oversight and the internal processes and procedures supporting oversight. Implementing safety risk controls comes at a price – money, time, resources – and the aim of safety risk controls is usually to improving safety performance, not production performance. However, some investments in “protection” can also improve “production” by reducing accidents and incidents and thereby their associated costs. 2.4.2 The safety space is a metaphor for the zone where an organization balances desired production/profitability while maintaining required safety protection through safety risk controls. For example, a service provider may wish to invest in new equipment. The new equipment may simultaneously provide the necessary efficiency improvements as well as improved reliability and safety performance. Such decision-making involves an assessment of both the benefits to the organization as well as the safety risks involved. The allocation of excessive resources to safety risk controls may result in the activity becoming unprofitable, thus jeopardizing the viability of the organization. 2-8 Safety Management Manual (SMM) 2.4.3 On the other hand, excess allocation of resources for production at the expense of protection can have an impact on the product or service and can ultimately lead to an accident. It is therefore essential that a safety boundary be defined that provides early warning that an unbalanced allocation of resources exists, or is developing. Organizations use financial management systems to recognize when they are getting too close to bankruptcy and apply the same logic and tools used by safety management to monitor their safety performance. This enables the organization to operate profitably and safely within the safety space. Figure 5 illustrates the boundaries of an organization’s safety space. Organizations need to continuously monitor and manage their safety space as safety risks and external influences change over time. 2.4.4 The need to balance profitability and safety (or production and protection) has become a readily understood and accepted requirement from a service provider perspective. This balance is equally applicable to the State’s management of safety, given the requirement to balance resources required for State protective functions that include certification and surveillance. 2.5 SAFETY RISK MANAGEMENT Safety Risk Management (SRM) is a key component of safety management and includes hazard identification, safety risk assessment, safety risk mitigation and risk acceptance. SRM is a continuous activity because the aviation system is constantly changing, new hazards can be introduced and some hazards and associated safety risks may change over time. In addition, the effectiveness of implemented safety risk mitigation strategies need to be monitored to determine if further action is required. 2.5.1 Introduction to hazards 2.5.1.1 In aviation, a hazard can be considered as a dormant potential for harm which is present in one form or another within the system or its environment. This potential for harm may appear in different forms, for example: as a natural condition (e.g. terrain) or technical status (e.g. runway markings). 2.5.1.2 Hazards are an inevitable part of aviation activities, however, their manifestation and possible adverse consequences can be addressed through mitigation strategies which aim to contain the potential for the hazard to result in an unsafe condition. Aviation can coexist with hazards so long as they are controlled. Hazard identification is the first step in the SRM process. It precedes a safety risk assessment and requires a clear understanding of hazards and their related consequences. 2.5.2 Understanding hazards and their consequences 2.5.2.1 Hazard identification focuses on conditions or objects that could cause or contribute to the unsafe operation of aircraft or aviation safety-related equipment, products and services (guidance on distinguishing hazards that are directly pertinent to aviation safety from other general/industrial hazards is addressed in subsequent paragraphs). Figure 5. Concept of a safety space Chapter 2. Safety Management Fundamentals 2-9 2.5.2.2 Consider, for example, a fifteen-knot wind. Fifteen-knots of wind is not necessarily a hazardous condition. In fact, a fifteen-knot wind blowing directly down the runway improves aircraft take-off and landing performance. But if the fifteen-knot wind is blowing across the runway, a crosswind condition is created which may be hazardous to operations. This is due to its potential to contribute to aircraft instability. The reduction in control could lead to an occurrence, such as a lateral runway excursion. 2.5.2.3 It is not uncommon for people to confuse hazards with their consequences. A consequence is an outcome that can be triggered by a hazard. For example, a runway excursion (overrun) is a potential consequence related to the hazard of a contaminated runway. By clearly defining the hazard first, one can more readily identify possible consequences. 2.5.2.4 In the crosswind example above, an immediate outcome of the hazard could be loss of lateral control followed by a consequent runway excursion. The ultimate consequence could be an accident. The damaging potential of a hazard can materialize through one or many consequences. It is important that safety risk assessments identify all of the possible consequences. The most extreme consequence - loss of human life - should be differentiated from those that involve lesser consequences, such as: aircraft incidents; increased flight crew workload; or passenger discomfort. The description of the consequences will inform the risk assessment and subsequent development and implementation of mitigations through prioritization and allocation of resources. Detailed and thorough hazard identification will lead to more accurate assessment of safety risks. Hazard identification and prioritization 2.5.2.5 Hazards exist at all levels in the organization and are detectable through many sources including reporting systems, inspections, audits, brainstorming sessions and expert judgement. The goal is to proactively identify hazards before they lead to accidents, incidents or other safety-related occurrences. An important mechanism for proactive hazard identification is a voluntary safety reporting system. Additional guidance on voluntary safety reporting systems can be found in Chapter 5. Information collected through such reporting systems may be supplemented by observations or findings recorded during routine site inspections or organizational audits. 2.5.2.6 Hazards can also be identified in the review or study of internal and external investigation reports. A consideration of hazards when reviewing accident or incident investigation reports is a good way to enhance the organization’s hazard identification system. This is particularly important when the organization’s safety culture is not yet mature enough to support effective voluntary safety reporting, or in small organizations with limited events or reports. An important source of specific hazards linked to its operations and activities is from external sources such as ICAO, trade associations or other international bodies. 2.5.2.7 Hazard identification may also consider hazards that are generated outside of the organization and hazards that are outside the direct control of the organization such as extreme weather or volcanic ash. Hazards related to emerging safety risks are also an important way for organizations to prepare for situations that may eventually occur. 2.5.2.8 The following should be considered when identifying hazards: a) system description; b) design factors, including equipment and task design; c) human performance limitations (e.g. physiological, psychological, physical and cognitive); d) procedures and operating practices, including documentation and checklists, and their validation under actual operating conditions; e) communication factors, including media, terminology and language; 2-10 Safety Management Manual (SMM) f) organizational factors, such as those related to the recruitment, training and retention of personnel, compatibility of production and safety goals, allocation of resources, operating pressures and corporate safety culture; g) factors related to the operational environment (e.g. weather, ambient noise and vibration, temperature and lighting); h) regulatory oversight factors, including the applicability and enforceability of regulations, and the certification of equipment, personnel and procedures; i) performance monitoring systems that can detect practical drift, operational deviations or a deterioration of product reliability; j) human-machine interface factors; and k) factors related to the SSP/SMS interfaces with other organizations. Occupational safety health and environment hazards 2.5.2.9 Safety risks associated with compound hazards that simultaneously impact aviation safety as well as OSHE may be managed through separate (parallel) risk mitigation processes to address the separate aviation and OSHE consequences, respectively. Alternatively, an integrated aviation and OSHE risk mitigation system may be used to address compound hazards. An example of a compound hazard is a lightning strike on an aircraft at an airport transit gate. This hazard may be deemed by an OSHE inspector to be a “workplace hazard” (ground personnel/workplace safety). To an aviation safety inspector, it is also an aviation hazard with risk of damage to the aircraft and a risk to passenger safety. It is important to consider both the OSHE and aviation safety consequences of such compound hazards, since they are not always the same. The purpose and focus of preventive controls for OSHE and aviation safety consequences may differ. Hazard identification methodologies 2.5.2.10 The two main methodologies for identifying hazards are: a) Reactive. This methodology involves analysis of past outcomes or events. Hazards are identified through investigation of safety occurrences. Incidents and accidents are an indication of system deficiencies and therefore can be used to determine which hazard(s) contributed to the event. b) Proactive. This methodology involves collecting safety data of lower consequence events or process performance and analysing the safety information or frequency of occurrence to determine if a hazard could lead to an accident or incident. The safety information for proactive hazard identification primarily comes from flight data analysis (FDA) programmes, safety reporting systems and the safety assurance function. 2.5.2.11 Hazards can also be identified through safety data analysis which identifies adverse trends and makes predictions about emerging hazards, etc. Hazards related to SMS interfaces with external organizations 2.5.2.12 Organizations should also identify hazards related to their safety management interfaces. This should, where possible, be carried out as a joint exercise with the interfacing organizations. The hazard identification should consider the operational environment and the various organizational capabilities (people, processes, technologies) which could contribute to the safe delivery of the service or product’s availability, functionality or performance. Chapter 2. Safety Management Fundamentals 2-11 2.5.2.13 As an example, an aircraft turnaround involves many organizations and operational personnel all working in and around the aircraft. There are likely to be hazards related to the interfaces between operational personnel, their equipment and the coordination of the turnaround activity. 2.5.3 Safety risk probability 2.5.3.1 Safety risk probability is the likelihood or probability that a safety consequence or outcome will occur. It is important to consider different scenarios so that all potential consequences are considered. The determination of probability can be aided by questions such as: a) Is there a history of occurrences similar to the one under consideration, or is this an isolated occurrence? b) What other equipment or components of the same type might have similar concerns? c) How many personnel are following, or are subject to, the procedures in question? d) What is the exposure of the hazard under consideration? For example, what is the percentage of time the equipment or activity is in use during an operation. 2.5.3.2 Any factors underlying these questions will help when assessing the probability of the hazard consequences, taking into consideration all foreseeable scenarios. 2.5.3.3 An occurrence is considered foreseeable if a reasonable person should have expected the kind of occurrence under the same circumstances. Identification of every conceivable or theoretically possible hazard is neither possible nor desirable. Judgment is required to determine the appropriate level of detail in hazard identification. Service providers should exercise due diligence when identifying significant and reasonably foreseeable hazards related to their product or service. Note.— Regarding product design, the term “foreseeable” is intended to be consistent with its use in airworthiness regulations, policy, and guidance. 2.5.3.4 Table 1 below presents a typical safety risk probability classification table, which includes five categories to denote the probability related to an unsafe event or condition, the description of each category and an assignment of a value to each category. This example uses qualitative terms; quantitative terms could be defined to provide a more accurate assessment. This will depend on the availability of appropriate safety data and the sophistication of the organization and operation. Likelihood Meaning Value Frequent Likely to occur many times (has occurred frequently) 5 Occasional Likely to occur sometimes (has occurred infrequently) 4 Remote Unlikely to occur, but possible (has occurred rarely) 3 Improbable Very unlikely to occur (not known to have occurred) 2 Extremely improbable Almost inconceivable that the event will occur 1 Table 1. Safety risk probability table Note.— This is an example only. The level of detail and complexity of tables and matrices should be adapted to the particular needs and complexities of each organization. It should also be noted that organizations might include both qualitative and quantitative criteria. 2-12 Safety Management Manual (SMM) 2.5.4 Safety risk severity 2.5.4.1 Once the probability assessment has been completed, the next step is to assess the severity, taking into account the potential consequences related to the hazard. Safety risk severity is defined as the extent of harm that might reasonably occur as a consequence or outcome of the identified hazard. The severity classification should consider: a) fatalities or serious injury as a result of: i) being in the aircraft; ii) direct contact with any part of the aircraft, including parts which have become detached from the aircraft; or iii) direct exposure to jet blast; b) damage: i) aircraft sustains damage or structural failure which: 1) adversely affects the structural strength, performance or flight characteristics of the aircraft; 2) would normally require major repair or replacement of the affected component; ii) ATS or aerodrome equipment sustains damage which: 1) management of aircraft separation is adversely affected; or 2) landing capability is adversely affected. 2.5.4.2 The severity assessment should consider all possible consequences related to a hazard, taking into account the worst foreseeable situation. Table 2 below depicts a typical safety risk severity table. It includes five categories to denote the level of severity, the description of each category and the assignment of a value to each category. As with the safety risk probability table, this table is an example only. Severity Meaning Value Catastrophic • Aircraft / equipment destroyed • Multiple deaths A Hazardous • A large reduction in safety margins, physical distress or a workload such that operational personnel cannot be relied upon to perform their tasks accurately or completely • Serious injury • Major equipment damage B Major • A significant reduction in safety margins, a reduction in the ability of operational personnel to cope with adverse operating conditions as a result of an increase in workload or as a result of conditions impairing their efficiency • Serious incident • Injury to persons C Minor • Nuisance • Operating limitations • Use of emergency procedures • Minor incident D Negligible • Few consequences E Table 2. Example safety risk severity table Chapter 2. Safety Management Fundamentals 2-13 2.5.5 Safety risk tolerability 2.5.5.1 The safety risk index rating is created by combining the results of the probability and severity scores. In the example above, it is an alphanumeric designator. The respective severity/probability combinations are presented in the safety risk assessment matrix in Table 3. The safety risk assessment matrix is used to determine safety risk tolerability. Consider, for example, a situation where a safety risk probability has been assessed as Occasional (4), and safety risk severity has been assessed as Hazardous (B), resulting in a safety risk index of (4B). Safety Risk Severity Probability Catastrophic A Hazardous B Major C Minor D Negligible E Frequent 5 5A 5B 5C 5D 5E Occasional 4 4A 4B 4C 4D 4E Remote 3 3A 3B 3C 3D 3E Improbable 2 2A 2B 2C 2D 2E Extremely improbable 1 1A 1B 1C 1D 1E Table 3. Example safety risk matrix Note.— In determining the safety risk tolerability, the quality and reliability of the data used for the hazard identification and safety risk probability should be taken into consideration. 2.5.5.2 The index obtained from the safety risk assessment matrix should then be exported to a safety risk tolerability table that describes - in a narrative form - the tolerability criteria for the particular organization. Table 4 presents an example of a safety risk tolerability table. Using the example above, the criterion for safety risk assessed as 4B falls in the “intolerable” category. In this case, the safety risk index of the consequence is unacceptable. The organization should therefore take risk control action to reduce: a) the organization’s exposure to the particular risk, i.e. reduce the probability component of the risk to an acceptable level; b) the severity of consequences related to the hazard, i.e. reduce the severity component of the risk to an acceptable level; or c) both the severity and probability so that the risk is managed to an acceptable level. 2-14 Safety Management Manual (SMM) Safety Risk Index Range Safety Risk Description Recommended Action 5A, 5B, 5C, AA, AB, 3A INTOLERABLE Take immediate action to mitigate the risk or stop the activity. Perform priority safety risk mitigation to ensure additional or enhanced preventative controls are in place to bring down the safety risk index to tolerable. 5D, 5E, 4C, 4D, 4E. 3B, 3C, 3D, 2A, 2B, 2C, 1A TOLERABLE Can be tolerated based on the safety risk mitigation. It may require management decision 3E, 2D, 2E, 1B, 1C, 1D, 1E ACCEPTABLE Acceptable as is. No further safety risk mitigation required. Table 4. Example of safety risk tolerability 2.5.5.3 Safety risks are conceptually assessed as acceptable, tolerable or intolerable. Safety risks assessed as initially falling in the intolerable region are unacceptable under any circumstances. The probability and/or severity of the consequences of the hazards are of such a magnitude, and the damaging potential of the hazard poses such a threat to safety, that mitigation action is required or activities are stopped. 2.5.6 Assessing human factors related risks 2.5.6.1 The consideration of human factors has particular importance in SRM as people can be both a source and a solution of safety risks by: a) contributing to an accident or incident through variable performance due to human limitations; b) anticipating and taking appropriate actions to avoid a hazardous situation: and c) solving problems, making decisions and taking actions to mitigate risks. 2.5.6.2 It is therefore important to involve people with appropriate human factors expertise in the identification, assessment and mitigation of risks. 2.5.6.3 SRM requires all aspects of safety risk to be addressed, including those related to humans. Assessing the risks associated with human performance is more complex than risk factors associated with technology and environment since: a) human performance is highly variable, with a wide range of interacting influences internal and external to the individual. Many of the effects of the interaction between these influences are difficult, or impossible to predict; and b) the consequences of variable human performance will differ according to the task being performed and the context. 2.5.6.4 This complicates how the probability and the severity of the risk is determined. Therefore, human factors expertise is valuable in the identification and assessment of safety risks. (The management of fatigue using SMS processes is addressed in the Manual for the Oversight of Fatigue Management Approaches (Doc 9966)). Chapter 2. Safety Management Fundamentals 2-15 2.5.7 Safety risk mitigation strategies 2.5.7.1 Safety risk mitigation is often referred to as a safety risk control. Safety risks should be managed to an acceptable level by mitigating the safety risk through the application of appropriate safety risk controls. This should be balanced against the time, cost and difficulty of taking action to reduce or eliminate the safety risk. The level of safety risk can be lowered by reducing the severity of the potential consequences, reducing the likelihood of occurrence or by reducing exposure to that safety risk. It is easier and more common to reduce the likelihood than it is to reduce the severity. 2.5.7.2 Safety risk mitigations are actions that often result in changes to operating procedures, equipment or infrastructure. Safety risk mitigation strategies fall into three categories: a) Avoidance: The operation or activity is cancelled or avoided because the safety risk exceeds the benefits of continuing the activity, thereby eliminating the safety risk entirely. b) Reduction: The frequency of the operation or activity is reduced, or action is taken to reduce the magnitude of the consequences of the safety risk. c) Segregation: Action is taken to isolate the effects of the consequences of the safety risk or build in redundancy to protect against them. 2.5.7.3 The consideration of human factors is an integral part of identifying effective mitigations because humans are required to apply, or contribute to, the mitigation or corrective actions. For example, mitigations may include the use of processes or procedures. Without input from those who will be using these in “real world” situations and/or individuals with human factors expertise, the processes or procedures developed may not be fit for its purpose and result in unintended consequences. Further, human performance limitations should be considered as part of any safety risk mitigation, building in error capturing strategies to address human performance variability. Ultimately, this important human factors perspective results in more comprehensive and effective mitigations. 2.5.7.4 A safety risk mitigation strategy may involve one of the approaches described above or may include multiple approaches. It is important to consider the full range of possible control measures to find an optimal solution. The effectiveness of each alternative strategy must be evaluated before a decision is made. Each proposed safety risk mitigation alternative should be examined from the following perspectives: • Effectiveness. The extent to which the alternatives reduce or eliminate the safety risks. Effectiveness can be determined in terms of the technical, training and regulatory defences that can reduce or eliminate safety risks. • Cost/benefit. The extent to which the perceived benefits of the mitigation outweighs the costs. • Practicality. The extent to which mitigation can be implemented and how appropriate it is in terms of available technology, financial and administrative resources, legislation, political will, operational realities, etc. • Acceptability. The extent to which the alternative is acceptable to those people that will be expected to apply it. • Enforceability. The extent to which compliance with new rules, regulations or operating procedures can be monitored. • Durability. The extent to which the mitigation will be sustainable and effective. • Residual safety risks. The degree of safety risk that remains subsequent to the implementation of the initial mitigation and which may necessitate additional safety risk control measures. Next >