Root cause analysis
Encyclopedia
Root cause analysis is a class of problem solving
methods aimed at identifying the root cause
s of problems or events.
Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes (consequences) of one or more past events in order to identify what behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes and to identify the lessons to be learned to promote the achievement of better consequences.
The practice of RCA is predicated on the belief that problems are best solved by attempting to address, correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is more probable that problem recurrence will be prevented. However, it is recognized that complete prevention of recurrence by one corrective action is not always possible.
Nevertheless, in the U.S. nuclear power industry the NRC requires that "In the case of significant conditions adverse to quality, the measures shall assure that the cause of the condition is determined and corrective action taken to prevent repetition." [10CFR50, Appendix B, Criterion XVI, Sentence 2)] In practice more than one "cause" is allowed and more than one corrective action is not forbidden.
Conversely, there may be several effective measures (methods) that address the root causes of a problem. Thus, RCA is often considered to be an iterative process, and is frequently viewed as a tool of continuous improvement.
RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a pro-active method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management
.
Root cause analysis is not a single, sharply defined methodology; there are many different tools, processes, and philosophies for performing RCA. However, several very-broadly defined approaches or "schools" can be identified by their basic approach or field of origin: safety-based, production-based, process-based, failure-based, and systems-based.
Despite the different approaches among the various schools of root cause analysis, there are some common principles. It is also possible to define several general processes for performing RCA.
First: Is it readable?
If it is readable it will be grammatically correct, the sentences will make sense, it will be free on internal inconsistencies, terms will be defined, it will contain appropriate graphics, and the like.
Second: Does it contain a complete set of all of the causal relationships?
If it did contain a "complete set of all of the causal relationships" one could (at least):
1. Trace the causal relationships from the harmful outcomes to the deepest conditions, behaviors, actions, and inactions.
2. Show that the important attributes of the harmful outcomes were completely explained by the deepest conditions, behaviors, actions, and inactions.
Common cause analysis (CCA) common modes analysis (CMA) are evolving engineering techniques for complex technical systems to determine if common root causes in hardware, software or highly integrated systems interaction may contribute to human error or improper operation of a system. Systems are analyzed for root causes and causal factors to determine probability of failure modes, fault modes, or common mode software faults due to escaped requirements. Also ensuring complete testing and verification are methods used for ensuring complex systems are designed with no common causes that cause severe hazards. Common cause analysis are sometimes required as part of the safety engineering tasks for theme parks, commercial/military aircraft, spacecraft, complex control systems, large electrical utility grids, nuclear power plants, automated industrial controls, medical devices or other safety safety-critical systems with complex functionality.
A major issue with common cause analysis is that it often depends on previously completed weak, ineffective, and erroneous root cause analyses on individual events.
Problem solving
Problem solving is a mental process and is part of the larger problem process that includes problem finding and problem shaping. Consideredthe most complex of all intellectual functions, problem solving has been defined as higher-order cognitive process that requires the modulation and control of...
methods aimed at identifying the root cause
Root cause
A root cause is rarely an initiating cause of a causal chain which leads to an outcome or effect of interest. Commonly, root cause is misused to describe the depth in the causal chain where an intervention could reasonably be implemented to change performance and prevent an undesirable outcome.In...
s of problems or events.
Root Cause Analysis is any structured approach to identifying the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes (consequences) of one or more past events in order to identify what behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes and to identify the lessons to be learned to promote the achievement of better consequences.
The practice of RCA is predicated on the belief that problems are best solved by attempting to address, correct or eliminate root causes, as opposed to merely addressing the immediately obvious symptoms. By directing corrective measures at root causes, it is more probable that problem recurrence will be prevented. However, it is recognized that complete prevention of recurrence by one corrective action is not always possible.
Nevertheless, in the U.S. nuclear power industry the NRC requires that "In the case of significant conditions adverse to quality, the measures shall assure that the cause of the condition is determined and corrective action taken to prevent repetition." [10CFR50, Appendix B, Criterion XVI, Sentence 2)] In practice more than one "cause" is allowed and more than one corrective action is not forbidden.
Conversely, there may be several effective measures (methods) that address the root causes of a problem. Thus, RCA is often considered to be an iterative process, and is frequently viewed as a tool of continuous improvement.
RCA is typically used as a reactive method of identifying event(s) causes, revealing problems and solving them. Analysis is done after an event has occurred. Insights in RCA may make it useful as a pro-active method. In that event, RCA can be used to forecast or predict probable events even before they occur. While one follows the other, RCA is a completely separate process to Incident Management
Incident management
Incident Management refers to the activities of an organization to identify, analyze and correct hazards. For instance, a fire in a factory would be a risk that realized, or an incident that happened...
.
Root cause analysis is not a single, sharply defined methodology; there are many different tools, processes, and philosophies for performing RCA. However, several very-broadly defined approaches or "schools" can be identified by their basic approach or field of origin: safety-based, production-based, process-based, failure-based, and systems-based.
- Safety-based RCA descends from the fields of accident analysisAccident AnalysisAccident analysis is carried out in order to determine the cause or causes of an accident or series of accidents so as to prevent further incidents of a similar kind. It is also known as accident investigation. It may be performed by a range of experts, including forensic scientists, forensic...
and occupational safety and healthOccupational safety and healthOccupational safety and health is a cross-disciplinary area concerned with protecting the safety, health and welfare of people engaged in work or employment. The goal of all occupational safety and health programs is to foster a safe work environment...
. - Production-based RCA has its origins in the field of quality controlQuality controlQuality control, or QC for short, is a process by which entities review the quality of all factors involved in production. This approach places an emphasis on three aspects:...
for industrial manufacturingManufacturingManufacturing is the use of machines, tools and labor to produce goods for use or sale. The term may refer to a range of human activity, from handicraft to high tech, but is most commonly applied to industrial production, in which raw materials are transformed into finished goods on a large scale...
. - Process-based RCA is basically a follow-on to production-based RCA, but with a scope that has been expanded to include business processBusiness processA business process or business method is a collection of related, structured activities or tasks that produce a specific service or product for a particular customer or customers...
es. - Failure-based RCA is rooted in the practice of failure analysisFailure analysisFailure analysis is the process of collecting and analyzing data to determine the cause of a failure. It is an important discipline in many branches of manufacturing industry, such as the electronics industry, where it is a vital tool used in the development of new products and for the improvement...
as employed in engineeringEngineeringEngineering is the discipline, art, skill and profession of acquiring and applying scientific, mathematical, economic, social, and practical knowledge, in order to design and build structures, machines, devices, systems, materials and processes that safely realize improvements to the lives of...
and maintenanceMaintenance, Repair and OperationsMaintenance, repair, and operations or maintenance, repair, and overhaul involves fixing any sort of mechanical or electrical device should it become out of order or broken...
. - Systems-based RCA has emerged as an amalgamation of the preceding schools, along with ideas taken from fields such as change managementChange managementChange management is a structured approach to shifting/transitioning individuals, teams, and organizations from a current state to a desired future state. It is an organizational process aimed at helping employees to accept and embrace changes in their current business environment....
, risk managementRisk managementRisk management is the identification, assessment, and prioritization of risks followed by coordinated and economical application of resources to minimize, monitor, and control the probability and/or impact of unfortunate events or to maximize the realization of opportunities...
, and systems analysisSystems analysisSystems analysis is the study of sets of interacting entities, including computer systems analysis. This field is closely related to requirements analysis or operations research...
.
Despite the different approaches among the various schools of root cause analysis, there are some common principles. It is also possible to define several general processes for performing RCA.
General principles of root cause analysis
- The primary aim of RCA is to identify the factors that resulted in the nature, the magnitude, the location, and the timing of the harmful outcomes (consequences) of one or more past events in order to identify what behaviors, actions, inactions, or conditions need to be changed to prevent recurrence of similar harmful outcomes and to identify the lessons to be learned to promote the achievement of better consequences. ("Success" is defined as the near-certain prevention of recurrence.)
- To be effective, RCA must be performed systematically, usually as part of an investigation, with conclusions and root causes identified backed up by documented evidence. Usually a team effort is required.
- There may be more than one root cause for an event or a problem, the difficult part is demonstrating the persistence and sustaining the effort required to develop them.
- The purpose of identifying all solutions to a problem is to prevent recurrence at lowest cost in the simplest way. If there are alternatives that are equally effective, then the simplest or lowest cost approach is preferred.
- Root causes identified depend on the way in which the problem or event is defined. Effective problem statements and event descriptions (as failures, for example) are helpful, or even required.
- To be effective, the analysis should establish a sequence of events or timelineTimelineA timeline is a way of displaying a list of events in chronological order, sometimes described as a project artifact . It is typically a graphic design showing a long bar labeled with dates alongside itself and events labeled on points where they would have happened.-Uses of timelines:Timelines...
to understand the relationships between contributory (causal) factors, root cause(s) and the defined problem or event to prevent in the future. - Root cause analysis can help to transform a reactive culture (that reacts to problems) into a forward-looking culture that solves problems before they occur or escalate. More importantly, it reduces the frequency of problems occurring over time within the environment where the RCA process is used.
- RCA is a threat to many cultures and environments. Threats to cultures often meet with resistance. There may be other forms of management support required to achieve RCA effectiveness and success. For example, a "non-punitory" policy towards problem identifiers may be required.
Evaluating root cause analysis
Root Cause Analysis Reports, like other 'deliverables' can vary in quality. Each stakeholder can have their own qualitative and quantitative acceptance criteria. There are some general possiblilities for evaluating root cause analysis outputs.First: Is it readable?
If it is readable it will be grammatically correct, the sentences will make sense, it will be free on internal inconsistencies, terms will be defined, it will contain appropriate graphics, and the like.
Second: Does it contain a complete set of all of the causal relationships?
If it did contain a "complete set of all of the causal relationships" one could (at least):
1. Trace the causal relationships from the harmful outcomes to the deepest conditions, behaviors, actions, and inactions.
2. Show that the important attributes of the harmful outcomes were completely explained by the deepest conditions, behaviors, actions, and inactions.
General process for performing and documenting an RCA-based Corrective Action
Notice that RCA (in steps 3, 4 and 5) forms the most critical part of successful corrective action, because it directs the corrective action at the true root cause of the problem. The root cause is secondary to the goal of prevention, but without knowing the root cause, we cannot determine what an effective corrective action for the defined problem will be.- Define the problem or describe the event factually. Include the qualitative and quantitative attributes (properties) of the harmful outcomes. This usually includes specifying the natures, the magnitudes, the locations, and the timings.
- Gather data and evidence, classifying that along a timeline of events to the final failure or crisis. For every behavior, condition, action, and inaction specify in the "timeline" what should have been when it differs from the actual.
- Ask "why" and identify the causes associated with each step in the sequence towards the defined problem or event. "Why" is taken to mean "What were the factors that directly resulted in the effect?"
- Classify causes into causal factors that relate to an event in the sequence, and root causes, that if applied can be agreed to have interrupted that step of the sequence chain.
- If there are multiple root causes, which is often the case, reveal those clearly for later optimum selection. identify all other harmful factors that have equal or better claim to be called "root causes."
- Identify corrective action(s) that will with certainty prevent recurrence of each harmful effect, including outcomes and factors. Check that each corrective action would, if pre-implemented before the event, have reduced or prevented specific harmful effects.
- Identify solutions that effective, prevent recurrence with reasonable certainty with consensus agreement of the group, are within your control, meet your goals and objectives and do not cause introduce other new, unforeseen problems.
- Implement the recommended root cause correction(s).
- Ensure effectiveness by observing the implemented recommendation solutions.
- Other methodologies for problem solving and problem avoidance may be useful.
- Identify and address the other instances of each harmful outcome and harmful factor.
Root cause analysis techniques
- Re-enactment-for example having the participants in the event do it over the way it was done (with due care to avoid the same harmful outcomes).
- Re-enactment using a computer or a simulator.
- Comparative re-enactment-doing it over the right way as well as the way it was actually done.
- Re-construction-reassembling all of the available accident debris to see clues as to how the disassembly occurred.
- Barrier analysis - a technique often used in process industries. It is based on tracing energy flows, with a focus on barriers to those flows, to identify how and why the barriers did not prevent the energy flows from causing harm.
- Bayesian inferenceBayesian inferenceIn statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...
- Change analysis - an investigation technique often used for problems or accidents. It is based on comparing a situation that does not exhibit the problem to one that does, in order to identify the changes or differences that might explain why the problem occurred.
- "Delta Work"-comparing the way an episode did happen with the way it was intended to happen.
- Current Reality TreeCurrent reality treeCurrent reality tree refers to the repair activity for ships. An initial inquiry check for availability of dock space and repair berth, vessel draft restriction, repair scope compared with yard work load estimating and forward any queries....
- A method developed by Eliahu M. Goldratt in his theory of constraintsTheory of constraintsThe theory of constraints adopts the common idiom "A chain is no stronger than its weakest link" as a new management paradigm. This means that processes, organizations, etc., are vulnerable because the weakest person or part can always damage or break them or at least adversely affect the...
that guides an investigator to identify and relate all root causes using a cause-effect tree whose elements are bound by rules of logic (Categories of Legitimate Reservation). The CRT begins with a brief list of the undesirables things we see around us, and then guides us towards one or more root causes. This method is particularly powerful when the system is complex, there is no obvious link between the observed undesirable things, and a deep understanding of the root cause(s) is desired. - Failure mode and effects analysisFailure mode and effects analysisA failure modes and effects analysis is a procedure in product development and operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures...
- Fault tree analysisFault tree analysisFault tree analysis is a top down, deductive failure analysis in which an undesired state of a system is analyzed using boolean logic to combine a series of lower-level events...
- Five whys emphasizes recursive depth, using the heuristicHeuristicHeuristic refers to experience-based techniques for problem solving, learning, and discovery. Heuristic methods are used to speed up the process of finding a satisfactory solution, where an exhaustive search is impractical...
that you're probably not done until you've looked five levels deep. - Ishikawa diagramIshikawa diagramIshikawa diagrams are causal diagrams that show the causes of a certain event -- created by Kaoru Ishikawa . Common uses of the Ishikawa diagram are product design and quality defect prevention, to identify potential factors causing an overall effect...
s emphasize initial breadth, using a checklistChecklistA checklist is a type of informational job aid used to reduce failure by compensating for potential limits of human memory and attention. It helps to ensure consistency and completeness in carrying out a task...
of types of causes that should be considered. - Why-Because analysisWhy-Because analysisWhy–because analysis is a method for accident analysis. It is independent of application domain and has been used to analyse, among others, aviation-, railway-, marine- and computer related accidents and incidents. It is mainly used as an after the fact analysis method...
emphasizes recursive breadth, using the concepts of necessary and sufficient causes. - Pareto analysisPareto analysisPareto analysis is a statistical technique in decision making that is used for selection of a limited number of tasks that produce significant overall effect. It uses the Pareto principle – the idea that by doing 20% of work, 80% of the advantage of doing the entire job can be generated...
"80/20 rule" - RPR Problem DiagnosisRPR Problem DiagnosisRPR is a problem diagnosis method specifically designed to determine the root cause of IT problems.- Overview :RPR deals with failures, incorrect output and performance issues, and its particular strengths are in the diagnosis of ongoing & recurring grey problems...
- An ITILInformation Technology Infrastructure LibraryThe Information Technology Infrastructure Library , is a set of good practices for IT service management that focuses on aligning IT services with the needs of business. In its current form , ITIL is published in a series of five core publications, each of which covers an ITSM lifecycle stage...
-aligned method for diagnosing IT problems. - Kepner-TregoeKepner-TregoeKepner-Tregoe, Inc. is a multinational management consulting and training services company. It provides consultation and training to companies in industries such as manufacturing, electronics, chemical, pharmaceuticals, and financial services....
Approach - Project Management Approaches. An adverse event can be viewed as a project whose final product was harm. The event can be understood by re-casting it in the classical Project Management devices such as Work Breakdown Structure, Gantt Chart, and Planning Logic Network.
Common cause analysis (CCA) common modes analysis (CMA) are evolving engineering techniques for complex technical systems to determine if common root causes in hardware, software or highly integrated systems interaction may contribute to human error or improper operation of a system. Systems are analyzed for root causes and causal factors to determine probability of failure modes, fault modes, or common mode software faults due to escaped requirements. Also ensuring complete testing and verification are methods used for ensuring complex systems are designed with no common causes that cause severe hazards. Common cause analysis are sometimes required as part of the safety engineering tasks for theme parks, commercial/military aircraft, spacecraft, complex control systems, large electrical utility grids, nuclear power plants, automated industrial controls, medical devices or other safety safety-critical systems with complex functionality.
A major issue with common cause analysis is that it often depends on previously completed weak, ineffective, and erroneous root cause analyses on individual events.
Basic elements of root cause using Management Oversight Risk Tree (MORT) Approach Classification
- Materials
- Defective raw material
- Wrong type for job
- Lack of raw material
- Man Power
- Inadequate capability
- Lack of Knowledge
- Lack of skill
- Stress
- Improper motivation
- Machine / Equipment
- Incorrect tool selection
- Poor maintenance or design
- Poor equipment or tool placement
- Defective equipment or tool
- Environment
- Disordered workplace
- Poor job design and/or layout of work
- Surfaces poorly maintained
- Inability to meet physical demands of the task
- Forces of nature
- Management
- Lack of management involvement
- Inattention to task
- Task hazards not dealt with properly
- Other (horseplay, inattention....)
- Stress demands
- Lack of Process
- Lack of Communication
- Methods
- No or poor procedures
- Practices are not the same as written procedures
- Poor communication
- Management system
- Training or education lacking
- Poor employee involvement
- Poor recognition of hazard
- Previously identified hazards were not eliminated
See also
- Failure mode and effects analysisFailure mode and effects analysisA failure modes and effects analysis is a procedure in product development and operations management for analysis of potential failure modes within a system for classification by the severity and likelihood of the failures...
- Fault tree analysisFault tree analysisFault tree analysis is a top down, deductive failure analysis in which an undesired state of a system is analyzed using boolean logic to combine a series of lower-level events...
- Forensic engineeringForensic engineeringForensic engineering is the investigation of materials, products, structures or components that fail or do not operate or function as intended, causing personal injury or damage to property. The consequences of failure are dealt with by the law of product liability. The field also deals with...
- Fix it twiceFix it twiceFix it twice or fix it two ways is a management term coined by Joel Spolsky, who coined the term for a process improvement method. It implies a quick, immediate solution for fixing an incident and a second, slower fix for preventing the same problem from occurring again by targeting the root...