4.2. Identifying sources of resilience: learning from what goes well

Jump to: navigation, search
Provide feedback on this topic
The link takes you to a secure and anonymous Google form.

Light bulb icon tips.svg Open the link in a new tab or window if you want to see the guideline content in parallel

One of the aims of Resilience Engineering is to learn from the everyday performance and from successful operations, rather than by only through lessons learned after failures. In line with this, identifying Sources of Resilience means investigating the mechanisms by which organizations successfully handle expected and unexpected conditions. Such mechanisms (e.g., strategies, processes, tools) allow the organization to adapt, perform and deliver required services in spite of the variability and complexity they experience in their operations. This adaptive capacity can be recognized by looking at the work-as-done, both in daily operations and unusual or exceptional scenarios, in order to identify sources of resilience and to learn from what goes well.



Organizations need to invest in the understanding of everyday operations in order to better be prepared for crisis situations. Resources for building up and maintaining this understanding need to be allocated, an investment with the purpose of retaining, enhancing or amplifying the organization's (or, organizations') resilient capabilities. This means, among other resources, that time needs to be available from experts to share their views on the functioning of the system, as well as facilitators or analysts (possibly experts on resilience management) that are able to compile this knowledge so that the organization may learn from it in a methodological manner.

To identify sources of resilience:

  • Build the necessary skills to understand and identify sources of resilience at different levels of the organization.
  • Select methods for the identification of possible sources of resilience with the involvement of roles and actors at different levels in the organization, making sure to account for an adequate diversity of perspectives. In order to achieve such diversity, combine individual interviews and workshop-based techniques, taking into account time constraints and availability of resources.
  • Plan the methods around triggering questions to be used as guide for defining and describing margins and couplings in daily operations (triggering questions before) or looking back at past events to identify successful skills, strategies, and procedures (triggering questions after).
  • Use the outcome of your analysis to revise your internal guidelines, training or to create ad-hoc ones.

Before a crisis

The following triggering questions can be used to guide a discussion aimed to understand work-as-done, both in daily operations and in situations of crisis.
This can be done in a number of activities, such as dedicated workshops, through interviews, group interviews, observational studies informing analyses, and over-the-shoulder observations, etc. The analyses as such can be part of other safety, security, and change management activities, audits, safety assessments, concept design sessions, etc.
The discussion should be intended as a way to improve the capability of the organization to react to a situation of crisis, by revising internal guidelines and procedures in light of the existing practices that have shown to work well.

Triggering questions

Adaptive capacity:
  • Which strategies (e.g. working methods or contingency procedures) can be used to handle a sudden loss of capacity and/or increase in demands?
  • For which events is there a response ready?
  • How and when can existing roles and tasks be reorganized in response to such events?
  • Is the personnel exposed to unusual situations as part of the training?

Operational Margins:

  • Which margins are available in everyday operational situations that can be used to handle suddenly increased demands?
  • Which margins have been defined and anticipated beforehand?
  • How is it possible to increase existing margins?
  • When is it necessary to negotiate this increase with other actors? With which actors?
  • Are there criteria to establish when it is possible to revert to the original margins?


  • How and when can additional resources (human, technical, material) be allocated/called in to integrate existing ones?
  • What back-up (incl. legacy) resources and working methods are available? Is personnel (still) familiar with these in order to readily use them?
  • What kind of coordination with other actors needs to be established for additional resources?
  • Are there criteria to establish when it is possible to revert to the original set of resources?


  • Which roles in the organization can monitor the margins/resources available, both during and after an unexpected increase in demands?
  • How are margins/resources monitored?
  • Which monitoring mechanisms are put in place by the organization to anticipate and assess possible threats that may occur in the future?

Goal trade-offs:

  • During the management of everyday operations or crises, are there different goals that may come in conflict (e.g. ensuring adequate safety margins vs. minimizing economic losses)?
  • How do operators succeed in meeting conflicting goals and finding appropriate balance among them?

Dependencies and interactions:

  • What strategies (could) foster a smooth coordination among actors and minimize constraints and bottlenecks?
  • Where do more efforts need to be spent to understand the potential for small variations in conditions and performance outcomes to combine, propagate, and amplify across organizations (so-called “cascading”, “butterfly” or “snowball” effects)?
  • What do operators (need to) know about the other parts of the system that they are interacting with?
  • How are formal and informal networks nurtured that are useful in handling crises?

HC logo

Healthcare implementation - Before

Monitoring and mapping the ordinary professional practices (for instance in an Emergency Department) during peacetime is highly recommended to learn how people (e.g. front-line staff) navigate the complexity of the healthcare system and adjust their practices to provide safe and high quality care. Organizational Ethnography is a recommended methodological approach to know and understand everyday professional practices within the context where and when things happen (see Method 2 in the Healthcare Practices, Methods and Tools section).

Learning from the ordinary offers opportunities to realign work-as-imagined from decision makers and safety managers (e.g. Nurse coordinators), and the “work-as-done” by the operational personnel and frontline employees, providing useful insights also to manage critical events (Fraser & Greenhalgh, 2001; Hollnagel, Braithwaite, & Wears, 2013).

The main question, guiding this learning process, concerns:

  • How do people usually navigate and adjust to the complexity of their professional practices to provide safe and high quality care? (Braithwaite, 2015)

The switch between "normal operations" and "serious emergency situations" often occurs in the healthcare domain. Therefore, the responsible actor in charge of taking decisions in everyday operations is in the best position to do this during a crisis situation (See Practice 1 in the Healthcare Practices, Methods and Tools section below).

As an example of margins, the following comment from a DARWIN DCoP workshop illustrates this point: ”the more I can do before [an event/earlier] the better margins I get after, with the moving of people for example”, i.e. sometimes margins are provided through for example resources or time by other activities, for example doing an activity earlier "buys one time" or provides other (safety) margins later on.

ATM logo

Air Traffic Management implementation - Before

Examples of an analysis of margins in Air Traffic Management (ATM) (Woltjer et al., 2015, p. 124) include fuel margins for aircraft operations, airspace margins for not vectoring too close to sector boundaries, time margins in sequencing and spacing, and aircraft separation margins.

These are some of the margins that are built into the ATM system (for example how the airspace is designed), into the technical systems that controllers work with (for example how timings in interfaces are designed), into procedures (for example minimum take-off time separations) or into the way of working of the air traffic controllers (for example ways of controlling traffic according to "defensive controlling" principles). These margins help controllers and Air Traffic Service units generally to handle the various expected and unexpected conditions and variations in circumstances, independently of their causes. For example, no matter why a level bust happens, margins in separation between aircraft at different flight levels enable the air traffic system to maintain safety.

If these margins are analysed and described explicitly as part of safety assessment and change management activities, Air Navigation Service Providers build up and maintain an understanding of what margins are being used to handle unexpected events, so that these conditions are not lost in changes to the functional system of people, procedures, and equipment.

On back-up systems, many operations in critical infrastructure have legacy systems, working methods, and resources in place in case of emergency. However, it cannot be taken for granted that all personnel is (still) current with these legacy resources, as they also need to be trained regularly. Paper and pencil and regular phone lines, as a simple example, are available in many domains in case computerized systems fail, but if exercises and training do not prepare personnel for technical failures and use of such other resources, using these may provide difficult. As another example, in Air Traffic Management older technical systems are often available as backup, as well as "procedural control" (controlling aircraft based on a mental traffic picture only, with greater margins than with the radar screen) but regular training of these methods is necessary in order to keep these methods and resource use current to be used in emergencies. This may obviously be especially difficult for personnel who have not worked with legacy systems and methods on a daily basis.

During a crisis

Observe and document application of procedures, methods etc. and their outcome, i.e. not only when they fail, but also when they succeed. Take a step back and reflect on whether conflicting goals are balanced appropriately, where more adaptive capacity is needed, and whether complexity is handled appropriately.

Triggering questions

Probe where things are going well by asking:
  • Where do we never experience (this problem/good operation)? Why is that?
  • Is the organization flexible, adaptable? To what extent and in what way can the organization change to adapt to demands?
  • Do we support colleagues in case of overload?
  • Do we have people available with different competences that can take different roles if required?

HC logo

Healthcare implementation - During

  • The observation and documentation of the application of procedures, methods etc. and their outcome (both when they fail and succeed) should concern both specifically the healthcare sector and the healthcare in collaboration with other actors according to a common ground perspective (see Practice 3 in the Healthcare Practices, Methods and Tools section below).

ATM logo

Air Traffic Management implementation - During

The activities concerning this phase are relevant for air traffic management. The issue is “HOW” and “BY WHOM” they can be accomplished since, during a crisis, it is difficult to find someone that is capable and available to observe and collect the information.

After a crisis

The following triggering questions can be used after the occurrence of an actual crisis which was successfully managed, in order to understand which of the existing practices have shown to work well. This can be done in a number of activities, such as dedicated workshops, debriefing sessions, after-action reviews, exercise analyses, interviews, group interviews, incident investigations, lessons learned analyses, etc. Example activities that can be done during these activities using the triggering questions are:

  1. Analyzing the differences between the intended use of procedures and their actual use during the crisis (Understanding which surprises were experienced and which strategies or working methods came out to be successful).
  2. Sharing of case studies between organizations (Explaining what happened, from the point of view of those involved, and ask to the participants how they would have reacted to the same situation).
  3. Proposing changes and/or adaptation to existing plans, resource allocations, guidelines, and procedures, based on what was learnt from the crisis.

Triggering questions

Adaptive capacity:
  • Which strategies (e.g. working methods or contingency procedures) were used to handle sudden losses of capacity and/or increases in demands?
  • Were the exiting roles reorganized in response to such events?
  • Was the allocation of tasks among different actors modified?
  • Were the situations experienced in the context of training activities useful to handle the situation?

Operational Margins:

  • Which margins were actually available to handle sudden losses of capacity and/or increases in demands?
  • Which of these margins were defined and anticipated beforehand?
  • As the crisis developed, was an adjustment of the margins required?
  • Was it necessary to negotiate margin adjustments with other actors?
  • If the available margins were changed during the crisis, when was it possible to revert to the original margins?


  • Was it necessary to allocate/call in additional resources (human, technical, material) as the crisis developed?
  • Was a coordination with other actors needed in order to allocate/call in such additional resources?
  • If additional resources were called in from other organizations or from other departments, when was it possible to release them back?


  • Which roles in the organization monitored the margins/resources available?
  • How were margins/resources actually monitored?
  • Were the threats experienced during the crisis somehow anticipated by the available monitoring mechanisms?
  • In which way did the available monitoring mechanisms help to anticipate the threats?

Goal trade-offs:

  • During the management of the crisis, did we experience situations of conflicting goals that affected our way of managing it?
  • How did the operators succeed in meeting conflicting goals and finding the appropriate balance between them (e.g. ensuring adequate safety margins vs. minimizing economic losses)?

Dependencies and interactions:

  • Which strategies worked better to minimize constraints and bottlenecks when coordinating among different actors?
  • How did the knowledge of other parts of the organization contribute to facilitate the handling of sudden losses of capacity and/or increases in demands?
  • Which strategies worked to minimize the cascading-effects of the crisis?
  • How can we improve existing training by taking into account successful synergies with different organizations/departments experienced during the handling of the crisis?

HC logo

Healthcare implementation - After

Once the differences between intended use of procedures, methods as work-as-intended (WAI) and actual work-as-done (WAD) have been analyzed after a specific case, broader data may be collected to understand as to how work-as-done is performed for everyday operations (across many cases), for example through observations, interviews, or questionnaires. These broader investigations into work-as-done may be analyzed and included into the reporting after the specific crisis in order to understand how the specific case relates to everyday work on a broader scale. I.e. the specific case may be an example or wide-spread everyday practices, and not be unique to the case at hand, which is important to understand and relate to in reporting after the specific event.

Changing goal trade-offs as a source of resilience can be found in health care, which is important to understand in the After phase of an analysis understanding a past event. When patient safety is at stake in a certain particularly pressing situation of life-and-death, certain goals such as privacy may need to be sacrificed in order to not lose time for an urgent treatment and save the patient's life. Thus, goal trade-offs need to be dynamically adjusted and goals may need to be sacrificed depending on the situation, which is a source of resilience.

ATM logo

Air Traffic Management implementation - After

An example of changing goal trade-offs as a source of resilience can be found in the Air Traffic Management domain, which is important to understand in the After phase of an analysis understanding a past event: "Performance goals change depending upon the situation. For example, in the case that an [air traffic service unit] loses the display that shows where the aircraft are situated [due to technical problems], the primary goal [of the air traffic controller] will shift from providing both efficient and safe flow of traffic towards solely providing separation between aircraft using all means to achieve this [safety]." (Woltjer, et al., 2015, p. 120).

Commonly, also, cockpit procedures will prescribe the intended use of checklists in various situations, often with the "disclaimer" that pilots may divert from these instructions if flight safety requires that the situation is dealt with in another way. In these circumstances, safety becomes the primary goal to be prioritized. Thus, goal trade-offs need to be dynamically adjusted and goals may need to be sacrificed depending on the situation, which is a source of resilience.


Understanding the context

Detailed objectives

One of the aims of Resilience Engineering is providing a deepened understanding of everyday performance, in order to learn, not only from failures, but also from successful operations. Resilience management should not only be based on analysis of risk and "brittleness" illustrated through failures during incidents and crises, but on an understanding of all outcomes of everyday operations, including the positive ones. Learning from what goes well during normal operations in safety critical work as well as when incidents and crises occur, can support better preparedness and learning, thus increasing resilience. The study of everyday operations can reveal how the organization are managing normal conditions through the adaption to occurring events , but also how and when procedures are adapted.

Targeted actors

Actors that may benefit from this topic include actors involved in safety, security, and change management activities, audits, safety assessments, concept development sessions, debriefing sessions, after-action reviews, exercise analyses, and incident investigations. This may include policy makers, middle and line management, operational management, and a variety of operational roles.

HC logo

Healthcare actors

Actors should be identified in the following areas:

  • Policy makers and regulatory bodies at different levels: International Organizations (WHO, ECDC), Ministry of Health; Regions/ Counties, NGOs.
  • Operational institutions that operate on the territory (hospitals, local health units, etc.).
  • Patients (as class and as individuals).

ATM logo

Air Traffic Management actors

The roles and responsibilities of involved actors change according to the type of crisis and the related environment of operations. The "Identification of sources of resilience" must encompass most of the activities of the organization, at all levels, starting from senior management to front line operators.

The actors involved are those listed below:

  • Air Navigation Service Providers (both civil and military)
  • Aircraft owners and operators
  • Aircraft manufacturers
  • Aviation regulatory authorities (National and International)
  • ATFCM (Air Traffic Flow and Capacity Management)
  • International aviation organizations (i.e. EUROCONTROL, ICAO, CANSO, etc.)
  • Investigative agencies
  • Flying public
  • Airport operators (if airports and/or ground operations are concerned by the crisis)
  • Firefighters (if airports and/or ground operations are concerned by the crisis)
  • Police (if airports and/or ground operations are concerned by the crisis)

Expected benefits

Enhanced understanding of everyday situations focusing on essential functions that makes a critical infrastructure work. The organization can use this understanding to retain, enhance or amplify the organization's (or, organizations') resilient capabilities, thereby ensuring that everyday processes go well as much as possible.

Relation to adaptive capacity

This capability card is in essence an elaboration on how to identify and increase adaptive capacity.

Relation to risk management

Support investments in the ability to maintain operation and continuity of operations for different kinds of systems and organizations at different levels.


"'High Workload at the Maternity Ward
A remarkably large number of births one evening led to chaos at the maternity ward. The ward was understaffed and no beds were available for more patients arriving. Also, patients from the emergency room with gynaecological needs were being directed to the maternity ward as the emergency room was overloaded. To cope with the situation one of the doctors started to free resources by sending all fathers of the new-born babies home. Although not a popular decision among the patients this re-organization freed up beds, allowing the staff to increase their capacity and successfully manage all the patients and births. After this incident an analysis of the situation was performed that resulted in a new procedure for “extreme load at maternity hospital. The system demonstrated several important abilities contributing to system resilience as it uses its adaptive capacity to respond to and learn from the event". (Rankin et al, 2013)

HC logo

Healthcare illustration

Translating tensions into safe practices through dynamic trade-offs: the secret second handover - A specific threat to patient safety is when the ambulances are queuing in the Emergency Department, losing their ability to respond. In England, to improve this, ad hoc target times were specified. To achieve these target times, the process to receive handover was redesigned. Work-as-imagined was done in form of protocols and procedures. During field work such variations of the application of the dedicated handover (Work-as-done) were verified (Wears, 2015). This example demonstrates that it is possible to optimize the performance of the daily ambulances services by adjusting time-slots and avoiding waste of time.

ATM logo

Air Traffic Management illustration

An interesting illustrative case from the air traffic management context is represented by Competence assessment of air traffic controllers. In many professions, a regular check of competence is required. This is applicable for air traffic controllers, pilots, and airline maintenance, where international regulations have issued guidelines and requirements for competence. This includes:

  1. Continuous assessment by making observations of air traffic controllers (ATCO) during "normal" operational duties.
  2. Dedicated practical assessment on annual basis.
  3. Oral and/or written examinations. In other domains this is not the case." (Hollnagel, 2017)

This provide examples of observations as a source of understanding everyday work and sources of resilience.

A noteworthy illustrative case is represented by the activities that have been established after the eruption of the Icelandic Volcano: in fact EUROCONTROL has introduced several tools (i.e. EVITA) and groups (EACCC):

  • European crisis Visualization Interactive Tool for ATFCM (EVITA)

EVITA is a collaborative online tool which allows users to visualize the impact of a crisis on air traffic and on the available air traffic network capacity in Europe. It supports decision making in times of crisis and is the principal communications channel for airlines. It is one of the Network Operations Portal's (NOP’s) features and should be used for information purposes only. During major crisis situations, it supports the sharing of information between airlines, state regulators and air navigation service providers operating in Europe, in particular thanks to the functionality that allows airlines to identify precisely which of their flights may be impacted by ash. In fact, the tool, originally created to monitor ash concentration levels, could be used for other crises such as nuclear emergency, pandemics or security risks.

  • In May 2010, the European Commission (EC) and EUROCONTROL jointly established the European Aviation Crisis Coordination Cell (EACCC) to coordinate the management of crisis responses in the European ATM network. In addition to the EACCC members, EACCC Chair may decide to invite State focal points and, depending on the nature of the crisis, experts from relevant fields of expertise.

In the SESAR project 16.1.2 the i4D/CTA concept that is under development was analysed from a newly developed resilience engineering-based methodology using many of the concepts recommended for use here (see Woltjer et al, 2015, p. 127-128, from which examples are taken below): "The i4D/CTA concept aims to optimize the arrival traffic to the airport by using more accurate and reliable trajectory planning, defined, and agreed between airborne and ground sides in four dimensions: latitude, longitude, altitude and time (hence,4D)" ... through the use of a Controlled Time of Arrival (CTA).

  • Work-as-done will change with the introduction of i4D/CTA: "From a controller perspective the use of i4D/CTA ... entails that the main task is monitoring of traffic, as the responsibility for maintaining separation is still with the controller. However, the ... activity of actively maintaining separation continuously throughout en-route and TMA ... will change." In addition: "Currently the use of the arrival manager (AMAN) is flexible, as it is mostly a recommendation to controllers ... The i4D/CTA concept implies a stronger commitment, an agreement between air traffic controllers and aircraft crew on a Controlled Time of Arrival (CTA), ... suggested by the AMAN software.
  • A "significant trade-off triggered by i4D/CTA is between flexibility for controllers (e.g. to influence sequence and use vacant capacity) and predictability for airlines and airport services. This trade-off affects task complexity and demands on controllers". In turn, this trade-off affects the flexibility in the air traffic system as a whole, which is part of the sources of resilience.
  • Another source of resilience, margins, may also change with this new concept: "Generally, more optimization to use the runway comes with decreased tolerance and margin. E.g., a tight sequence with set CTAs leaves little margin to manage weather changes or aircraft with an emergency and avoid a knock-on effect of changed CTAs."
  • As an example of complexity and the potential for cascading effects: "There will be a change in working strategies in areas with complicated geography, complicated sector boundaries and use of temporarily restricted areas, that may lead to quicker transfers between sectors. ... In situations such as diversions, bad weather and quicker transfer between sectors, the time available and feasibility to predict their impact on traffic and adjust to the circumstances [may] decrease, and there [may] be increased possibility of these effects cascading to other aircraft, sectors, and air traffic control units."

Implementation considerations


Initial familiarisation with resilience concepts, in particular the understanding of everyday work when nothing goes wrong.

Implementation cost

Implementation can vary based on the number of dedicated workshops. Typically focus groups engage 4-8 experts and 2 facilitators for a about a day, but the number of focus groups or workshops (and experts) is dependent on the scope of the analysis. For example, for small systems/organizations a single workshop or focus group may be sufficient, but with larger systems/organizations natural boundaries between subparts may be defined for which a number of workshops are run. Note that the integration and interactions between subparts deserve explicit and dedicated attention.

It is also possible to complement existing practices in the organization, for instance by including the proposed triggering questions while planning or reviewing operations, or during audits.

Pre-workshop and follow-up analysis and fact checking may also be expected according to standard workshop, focus group, or interview methodologies.

HC logo

Healthcare implementation considerations

Associated Challenges

The background and context information in healthcare is one of the most complex. The mismatch between work-as-imagined and work-as-done constitutes the basis of this complexity (AIHI seminar), as explained below (Braithwaite, 2015):

Work-as-imagined (WAI), carried out by workers (blunt end) who:

  • Experience health care indirectly by interpreting and filtering information (indicators, statistics).
  • Receive delay in feedback.
  • Represent ideas about practice, (outcomes are the access information easily assessable).

Work-as-done (WAD), carried out by workers (sharp end) who:

  • Experience health care delivery first-hand.
  • Receive feedback with little or no delay.
  • Work in constantly changing and unpredictable conditions.

For applications of resilience and the gap between Work-as-Imagined (WAI) and Work-as-Done (WAD), see for example Hollnagel, Braithwaite, & Wears, (2015); Hollnagel, Braithwaite, & Wears (2013); or Wears, Hollnagel, & Braithwaite (2015).

Minimum Viable Solution

One of the first actions to carry out are the implementation of Problem Based Learning (PBL - see Method 1 in the Healthcare Practices, Methods and Tools section). This includes at least a two day face to face course with at least two representatives of the stakeholders involved.

ATM logo

Air Traffic Management implementation considerations

This concept refers to one of the most interesting topics that are arising in the last decade in Air Traffic Management, the so-called “Safety II” that is “move from ensuring that ‘as few things as possible go wrong’ to ensuring that ‘as many things as possible go right’” [1] The “positive” approach is getting more and more interest to complement the “negative” approach which is the one that is commonly used in the Safety Methodologies (i.e. study the system in advance and identify possible points of failure).


Relevant material

Relevant Practices, Methods and Tools


Understanding the difference between how work is assumed or expected to be done (Work-as-Imagined) and how it is actually done (Work-as-Done) (see Herrera, et al, 2017):

  • Teach value of, and how to ask, open-ended questions. (Schein, 2013)
  • Implement “Learning Teams” in your query where Work-as-Imagined and Work-as-Done are investigated (Hollnagel, 2017; Conklin, 2012).
  • Patient safety senior executive walk-arounds to understand how the work gets done on the frontlines.
  • Prepare to shift people for the “unexpected” such as environmental disasters or threats such as chemical spills or earthquakes, riots, terrorist attacks, and epidemics.
  • Overcapacity protocols to manage overcrowding in emergency departments. *Development of “rapid assessment zones” to reduce overcrowding in emergency departments.
  • Do simulations involving surprises as part of a certification program.
  • Share case studies between plants that tell story, from point of view of those involved, to just before revealing what happened, ask: “What would you do? How could this play out? What would you do to avoid/support…?”


Resilience Analysis Grid (RAG) with questions related to the resilience potentials to anticipate, monitor, respond and learn (Hollnagel, 2017 latest version of RAG).

Critical incident investigation work that uses a framework based on resilience perspectives (Health care Canada).

HC logo

Healthcare Practices, Methods and Tools


Practice 1. In Sweden in the healthcare sector there are switches between "normal operations" and "serious emergency situations". Other type of actors (no healthcare) stay as much as they can in normal operations according to standard allocation of decision rights. It means that the responsible actor to take decisions in everyday operations is in the best position to do it during a crisis situation.

Practice 2. The following real-life example shows how ED [Emergency Department] staff members employed multiple strategies that increased the resilience of their operations. Recently, at the start of the evening shift (15:00), the ED was boarding 43 patients; 28 of these patients filled the unit reserved for boarders; the remaining 15 were split among the acute care areas and the hallway. The use of the hallway as additional treatment space is an example of resilient adaptation at the departmental, as opposed to the individual, level. This procedure was first used several years earlier. By now, it had become part of normal operations, representing an organizational reconfiguration to establish a new equilibrium (Nemeth, Wears, Woods, Hollnagel, & Cook, 2008).

Practice 3. In the Swedish healthcare domain, several organizations have introduced good practices and methods aimed to establish Common Grounds. In the Region Östergötland the implementation of Common Grounds for cooperation and management is made by means of the crisis response system (MSB, 2014). This implementation includes actor-wide activities in all-phases:

  • Before: Proactive development of strategies for how to manage a crisis by e.g. common workshops and/or educations.
  • During: Effective working procedures for actor-wide management of social disturbances with common approaches.
  • After: Actor-based follow-up based on indicators for stakeholder cooperation.

Practice 4. A fieldwork (see Method 2 below) was carried out in an Emergency Department (ED) to investigate its properties of resilience and adaptive capacity in the face of uncertainty and limited resources. In particular, the focus of the analysis was on the shift from a routine day, in which the system (ED) operates under usual condition (described by practitioners as "run of the mill"), to a situation in which a key person recognized system degradation (i.e. load and demands increase) and initiates adaptive tactics (i.e. recruiting and reorganizing multiple resources) in order to manage and maintain performance (Anders, Woods, Wears, & Perry, 2006).


Method 1. Problem Based Learning (PBL). The ability to adapt to change and continuously improve performance - capability - is enhanced through feedback on performance, the challenge of unfamiliar contexts, and the use of non-linear methods such as storytelling and small group, and in particular the methodology called Problem Based Learning (PBL) that does not focus on problem solving with a defined solution, but it allows for the development of other desirable skills and attributes as knowledge acquisition and increased group collaboration and communication. This methodology was developed for medical education. PBL has been implemented within numerous undergraduate health curricula but less so in workforce training. Public health practice requires many of the skills that PBL aims to develop and would benefit from some exposure to this type of learning and highlights some of the practical issues (Trevena, 2007).

Method 2. Organizational Ethnography is a qualitative research approach looking at the social interaction of people in a given organizational environment (e.g. a hospital's emergency department). It provides in-depth and up-close understandings of how the everydayness of work is organized and how work organizes people in everyday organizational life. The focus is on practices, communications, shared artifacts/tools, and physical spaces used in working teams. Ethnography includes the participation of the researcher in the organizational context (fieldwork), the observation of everyday activities, field notes, interviews, video recordings, photography, and artifact analysis such as devices that a person uses throughout the day. The length of the studies can vary depending on the research objectives and the organizational availability to host the researcher (see Practice 4 above).

ATM logo

Air Traffic Management Practices, Methods and Tools


At ENAV, which is the Italian Air Navigation Service Provider, the practice is that critical events are studied and analysed. In some particular cases, training and educational meetings have been organized accordingly in order to present them to Air Traffic Controllers and managers. Also a special issue of the Company's internal magazine has been dedicated to present all the points of view of the particular event.


The guidance material that SESAR 16.1.2 and 16.6.1b has developed provides a method using workshops and various analytical techniques generating qualitative descriptions of Resilience Engineering principles applied to ATM services as done currently or as envisioned after introduction of a new technology or way of working. The guidance material has been integrated as part of the safety assessment methodology of SESAR (Single European Sky ATM Research), and as stand-alone guidance for ATM concept design processes.

ICAO “Doc 9995 AN/497 Manual of Evidence-based Training” highlights some methods concerning Competency-based training, in particular it lists several used methods/techniques together with their pros and cons. [2]


Teleconferences - During crises, the European Aviation Crisis Coordination Cell (EACCC) is normally convened via teleconferences.

"EUROCONTROL's Network Manager provides the best assistance it can to help mitigate the impact of major network disruptions or crisis situations. It also provides tools and services which enable users to anticipate or react to events more effectively, based on the best available knowledge of the ATM situation."[Source: https://www.eurocontrol.int/articles/tools-available-times-disruptions-and-crises]

On Skybrary, there is section dedicated to “Controller Training Methods and Tools” that provides a general description of training design and structure, simulator training, training techniques, computer based training. [3]


  • Berkes, F., and C. Folke. 2002. Back to the future: ecosystem dynamics and local knowledge, in L. H. Gunderson and C. S. Holling (Eds.). Panarchy: understanding transformations in human and natural systems (pp. 121-146). Washington, D.C.: Island Press.
  • Braithwaite J. (2015). Re-conceptualising patient safety through innovation and systems change. Seminar, September 29, 2015, Hong Kong. Available at: https://s3-eu-west-1.amazonaws.com/bmj-internationalforum/pdfs/Asia+Forum/A3+-+Jeffrey+Braithwaite+slides.pptx
  • Cavallo, A. & Ireland, V. (2014). Preparing for complex interdependent risks: A System of Systems approach to building disaster resilience. International Journal of Disaster Risk Reduction, 9, 181–193.
  • Djalante, R., Holley, C., Thomalla, F., & Carnegie, M. (2013). Pathways for adaptive and integrated disaster resilience. Natural Hazards, 69(3), 2105–2135.
  • Furniss, D., Back, J., Blandford, A., Hildebrandt, M., & Broberg, H. (2011). A resilience markers framework for small teams. Reliability Engineering & System Safety, 96(1), 2-10.
  • Gero, A., Fletcher, S., Rumsey, M., Thiessen, J., Kuruppu, N., Buchan, J., Daly, J., & Willetts, J. (2015). Disasters and climate change in the Pacific: adaptive capacity of humanitarian response organizations. Climate and Development, 7(1), 35–46.
  • Hémond, Y. & Robert, B. (2014). Assessment process of the resilience potential of critical infrastructures. International Journal of Critical Infrastructures, 10(3-4), 200-217.
  • Hoffman, R. R., & Woods, D. D. (2011). Beyond Simon’s Slice: Five Fundamental Trade-Offs that Bound the Performance of Macrocognitive Work Systems. IEEE Intelligent Systems, 26(6), 67–71.
  • Hollnagel, E. (2004). Barriers and accident prevention. Aldershot, UK: Ashgate.
  • Hollnagel, E. (2014). Is safety a subject for science? Safety Science 67, 21-24.
  • Hollnagel, E., Woods, D. D., & Leveson, N. (2006). Resilience engineering: concepts and precepts. UK: Ashgate Publishing, Ltd.
  • MSB (2014). Gemensamma grunder för samverkan och ledning vid samhällsstörningar (MSB777). Myndigheten för samhällsskydd och beredskap (MSB).
  • Rankin, A., Dahlbeck, N., & Lundberg, J. (2013). A case study of factor influencing role improvisation in crisis response teams. Cognition, Technology & Work 15(1).
  • Shirali, G. A., Motamedzade, M., Mohammadfam, I., Ebrahimipour, V., & Moghimbeigi, A. (2016). Assessment of resilience engineering factors based on system properties in a process industry. Cognition, Technology and Work, 18(1), 19–31.
  • Van Der Beek, D., & Schraagen, J. M. (2015). ADAPTER: Analysing and developing adaptability and performance in teams to enhance resilience. Reliability Engineering and System Safety, 141, 33-44.
  • Woods, D. D. (2003). Creating foresight: how resilience engineering can transform NASA’s approach to risky decision making. Work, 4(2), 137–144.
  • Woods, D. D. (2006). Essential characteristics of resilience. In E. Hollnagel, D. D. Woods, & N. Leveson (Eds.), Resilience engineering: Concepts and precepts (pp. 21–34). Aldershot, UK: Ashgate.

HC logo

Healthcare references

  • Anders, S., Woods, D. D., Wears, R. L. & Perry, S. J. (2006). Limits on adaptation: Modeling Resilience and Brittleness in Hospital Emergency Departments. In E. Hollnagel, & E. Rigaud, (Eds.), Proceedings of the Second Resilience Engineering Symposium: 8-10 November 2006. (pp. 1-9).Available at: http://www.resilience-engineering-association.org/download/resources/symposium/symposium-2006(2)/Anders_et_al.pdf
  • Fraser, S. W., & Greenhalgh, T. (2001). Coping with complexity: educating for capability. British Medical Journal, 323(7316), 799–803.
  • Hollnagel, E., Braithwaite, J. & Wears, R. L. (2015). From Safety-I to Safety-II: A White Paper. Resilient Health Care Net: Published simultaneously by the University of Southern Denmark, University of Florida, USA, and Macquarie University, Australia. http://resilienthealthcare.net/onewebmedia/WhitePaperFinal.pdf
  • E. Hollnagel, J. Braithwaite, & R. L. Wears (Eds.) (2013). Resilient health care. Farnham, UK: Ashgate.
  • Nemeth, C., Wears, R., Woods, D., Hollnagel, E., & Cook, R. (2008). Minding the gaps: Creating Resilience in Health Care. In K. Henriksen, J. B., Battles, M. A. Keyes et al. Eds.), Advances in Patient Safety: New Directions and Alternative Approaches (Vol. 3: Performance and Tools). Rockville (MD): Agency for Healthcare Research and Quality. Available at: https://www.ncbi.nlm.nih.gov/books/NBK43670/
  • Ray-Sannerud, B.N., Leyshon, S., & Vallevick, V. B. (2015). Introducing routine measurement of healthcare worker’s well-being as a leading indicator for proactive safety management systems based on Resilience Engineering.Procedia Manufacturing, 3, 319 – 326.
  • Trevena L. (2007). Problem-based learning in public health workforce training: A discussion of educational principles and evidence. New South Wales Public Health Bulletin, 18(1-2), 4-8.
  • Wears, R. L. (2015). Change of shift. Worn out by fatigue training. Annals of Emergency Medicine, 66(3), 334-5.
  • Wears, R. L., Hollnagel, E. & Braithwaite, J. (2015). Resilient health care, Volume 2: The resilience of everyday clinical work. Farnham, UK: Ashgate.

ATM logo

Air Traffic Management references

  • Herrera, I., Lay, B., Cardiff, K. (2017) From air to ground - Resilience strategies and innovation across critical infrastructures. 7th Symposium Resilience Engineering, Liege, Belgium.
  • Hollnagel, E., Leonhardt, J., Licu, T., & Shorrock, S. (2013). From Safety-I to Safety-II: a white paper. Brussels: European Organisation for the Safety of Air Navigation (EUROCONTROL).
  • Lundhal, M. (2016). Runway incursion prevention – a Safety II approach. Hindsight, 24, 46–49.
  • Woltjer, R., Pinska-Chauvin, E., Laursen, T., & Josefsson, B. (2015). Towards understanding work-as-done in air traffic management safety assessment and design. Reliability Engineering & System Safety, 141, 115–130. http://doi.org/10.1016/j.ress.2015.03.010


  • Resilience
    DARWIN adapts the following working definition: "The ability to resist, absorb, accommodate to and recover from the effects of disturbances and changes in a timely and efficient manner, including through adaptation and restoration of basic structures and functions" (Source: DARWIN D1.1, 2015).

    Some widely used related definitions that this working definition is based on:
    "Adaptive capacity of an organization in a complex and changing environment. Note Resilience is the ability of an organization to manage disruptive related risk" (Source: ISO 22300).
    "The ability of a system, community or society exposed to hazards to resist, absorb, accommodate to and recover from the effects of a hazard in a timely and efficient manner, including through the preservation and restoration of its essential basic structures and functions. Comment: Resilience means the ability to "resile from" or "spring back from" a shock. The resilience of a community in respect to potential hazard events is determined by the degree to which the community has the necessary resources and is capable of organizing itself both prior to and during times of need." (Source: UNISDR, 2009).
    "Intrinsic ability of a system or organization to adjust its functioning prior to, during, or following changes, disturbances, and opportunities so that it can sustain required operations under both expected and unexpected conditions" (Source: Hollnagel, 2014)

  • Work-as-done
    Work as done refers to he assumptions or expectations of what other people do [as part of their work] is called Work-as-Imagined (WAI), while that which people actually do [as part of their work] is called Work-as-Done (WAD) (Hollnagel, 2018, p. 17).

  • Work-as-imagined
    Work as imagined refers to the assumptions or expectations of what other people do [as part of their work] is called Work-as-Imagined (WAI), while that which people actually do [as part of their work] is called Work-as-Done (WAD). The term 'imagined' is not used in an uncomplimentary or negative sense but simply recognises that our descriptions of work will never completely correspond to work as it takes place in practice - as it is actually done (Source: Hollnagel, 2018, p. 17-18) and how work is being thought of either before it takes place when it is being planned or after it has taken place when the consequences are being evaluated (Source: Wears and Hollnagel, 2015).

  • Adaptive capacity
    "ability of systems, institutions, humans, and other organisms to adjust to potential damage, to take advantage of opportunities, or to respond to consequences" ISO 14080:2018(en), "The adaptive capacity of a system is usually assessed by observing how it responds to disruptions or challenges. Adaptive capacity has limits or boundary conditions, and disruptions provide information about where those boundaries lie and how the system behaves when events push it near or over those boundaries" (Source: Woods and Cook, 2006, p. 69)

  • Margin
    "How closely or how precarious the system is currently operating relative to one or another kind of performance boundary" (from Woods, 2006 - Woods, D. D. "Essential Characteristics of Resilience." In Resilience Engineering: Concepts And Precepts, edited by E. Hollnagel, D. D. Woods, and N. Leveson, 19–30. Adelshot, UK: Ashgate, 2006.)

  • Brittleness
    Brittleness describes how rapidly a system's performance declines when it nears and reaches its boundary conditions (Source: Woods, 2015).

  • Coupling
    Coupling (loose/tight) refers to the time-dependency of a process, the flexibility of action sequences, the number of ways to achieve a goal, and the availability of slack in operational resources (from Perrow, 1984 - Perrow, Charles. Normal Accidents: Living with High-Risk Technologies. New York: Basic Books, 1984.)

Navigate in the DRMG