What is FRACAS - Failure Reporting Made Simple


Published on

Want to develop a successful Failure Reporting, Analysis, and Corrective Action System? This article provides a great overview on the topic.

Published in: Education
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

What is FRACAS - Failure Reporting Made Simple

  1. 1. reliability What’s the FRACAS? upload Failure Elimination Made Simple by Ricky Smith, CPMM, CMRP and Bill Keeter, CMRP “Your system is perfectly designed to give you the results that you get.” W. Edwards Deming PhD H ow good is your organization at identifying failures? Of course you see failures when they occur, but can you identify when recurring failures are creating serious equipment reliability issues? Most compa- nies begin applying RCA or RCFA to “high value failures”. While this is not wrong, I prefer to either not see the failure in the first place, or at the least, to reduce the failures to a controllable level. Failure Reporting Analysis and Corrective Action System role in unreliability. Here are a few examples of those (FRACAS) is an excellent process that can be used to con- losses: trol or eliminate failures. This is a process in which you identify any reports from your CMMS/EAM or a special- 1. Equipment Breakdown (total functional failure) ized Reliability Software that can help you to eliminate, A. Causes of Equipment Breakdown mitigate or control failures. These reports could include 1. No Repeatable Effective Repair, Preventive cost variance, Mean Time Between Failure, Mean Time Be- Maintenance, Lubrication, or Predictive tween Repair, dominant failure patterns in your operation, Maintenance Procedure common threads between failures such as “lack of lubrica- 2. No one following effective procedures tion” (perhaps due to lubricator not using known industry 2 Equipment not running to rate (partial standards). One poll was conducted recently covering 80 functional failure) large companies. Shockingly, none of these companies A. Causes of Equipment not Running to Rate were capturing the data required to understand and con- 1. Operator not having an effective trol equipment failures. procedure to follow 2. Operator not trained to operate or Answer the following questions honestly before you go troubleshoot equipment any further to see if you have any problems with identify- 3. Management thinking this is the best rate ing failures and effectively eliminating or mitigating their at which the equipment can operate because effects on total process and asset reliability. of age or condition 3. Off-Quality Product that is identified as “first pass 1. Can you identify the top 10 assets which had the quality” (could be a partial or total functional failure) most losses due to a partial or total functional A. Causes of Quality Issues failure by running a report on your maintenance 1. Acceptance by management that “first pass software? quality” is not a loss because the product can 2. Can you identify the total losses in your organization be recycled and separate them into process and asset losses for 4. Premature Equipment Breakdown the past 365 days? A. Ineffective or no commissioning procedures. We 3. Can you identify components with a common thread are talking about maintenance replacement of due to a specific failure pattern, such as the one parts or equipment and engineering/contractor shown in Figure 1? that fails prematurely because no one has iden- tified if a defect is present after the equipment has been installed, repaired, serviced, etc. See Infant Mortality Figures 2 and 3. (If you have ever seen equipment break down or not run- ning to rate immediately after a shutdown, you know what we are talking about.) Figure 1 - Failure Pattern from Nowlan and Heap Study The Proactive Workflow Model – Eliminating unreliability Many times, the cost of unreliability remains unknown be- is a continuous improvement process much like the Proac- cause the causes of unreliability are so many. Whether tive Work Flow Model in Figure 4. The Proactive Work- you want to point the finger at maintenance, production flow Model illustrates the steps required in order to move (operations) or engineering, each functional area plays a from a reactive to a proactive maintenance program. 44 june/july 2010
  2. 2. the FRACAS provides con- tinuous improvement for Failure Modes your maintenance strate- Analysis gies. There are fundamen- tal items you must have in place to insure that you Strategy Failure Codes Adjustments Creation receive the results you ex- F.R.A.C.A.S. pect. Kaizen Loop Figures 2 and 3 – Defects Identified Think of FRACAS this way. As you have failures, you Root Cause Work Order What the Proactive Work Flow Model really use your CMMS/EAMS failure codes to record Analysis History Analysis means to your organization – Implementing the part-defect-cause of each failure. Analyz- the Proactive Work Flow Model is the key to ing part-defect-cause on critical assets helps eliminating failures. The built-in continuous you begin to make serious improvement in Figure 5 – FRACAS Loop improvement processes of Job Plan Improve- your operation’s reliability. Looking at the ment and the Failure Reporting, Analysis, and FRACAS Model in Figure 5, we begin with ments of an effective FRACAS are an effective Corrective Action System (FRACAS) help ensure Work Order History Analysis, and from this validated equipment hierarchy, criticality anal- that maintainability and reliability are always analysis we decide whether we need to apply ysis, failure modes analysis, and equipment improving. All of the steps and processes have Root Cause Analysis (RCA), Reliability Centered maintenance plans. to be implemented in a well managed and Maintenance, or Failure Modes and Effect controlled fashion to get full value out of the Analysis to eliminate or reduce the failures we FRACAS Checklist: model. discover. From the RCA, we determine mainte- nance strategy adjustments needed to predict Equipment Hierarchy should be built and vali- The foundational elements of Asset Health As- or prevent failures. Even the most thorough dated so that similar failures on like equipment surance are keys because they ensure that all of analysis doesn’t uncover every failure mode. can be identified across an organization. the organization’s assets are covered by a com- Performance monitoring after we make the plete and correct Equipment Maintenance Plan strategy adjustments may find that new failure Criticality Analysis is developed and validated (EM). These are requirements (not options) to modes not covered by your strategy occur. You so that equipment criticality is ranked based ensure that you have a sustainable proactive can now make a new failure code to track the on Production Throughput, Asset Utilization, workflow model. new failure mode so additional failures can be Cost, Environment, and Safety. tracked and managed when you review work You cannot have continuous improvement Failure Modes Analysis is completed on all criti- order history. You can see this is a continuous until you have a repeatable, disciplined cal equipment using FMA, FMEA, or RCM. improvement loop which never ends. process. Equipment Maintenance Plans are developed Steps to Implementing an Effective on all critical equipment to prevent or predict The objective of the Proactive Work Flow Mod- FRACAS el is to provide discipline and repeatability to a failure. your maintenance process. The inclusion of Let’s back up a little. The foundational ele- Effective Equipment Hierarchy – Asset Catalog or Equipment Hierarchy must be developed to provide the data required to manage a proac- tive maintenance program which includes fail- Asset Health Assurance Asset Catalog ProActive Work Flow Model ure reporting or FRACAS (Failure Reporting, Analysis and Corrective Action System). In Asset Criticality Analysis order to eliminate failures, one needs to en- sure this is a successful first step. Figure 6 (on Failure Modes Analysis Predictive & Preventative Work 80% of Total Work M.O.C. Procedures the following page) displays the findings from C.A.S. LOOP F.R.A. a plant with 32 total “Part – Bearing” failures Equipment Maintenance from different size electric motors (“Part” is Plan (EMP) PdM/PM Inspections Results of PdM/PM Equipment Reliability Improvements identified from a CMMS/EAM Codes drop down screen). One type “Defect – Wear” occurred Proactive Work Flow M.O.C. Procedures Planning M.O.C. Procedures in 85% of the failures (“Defect” is identified Failure Modes Analysis N LOOP from a CMMS/EAM Codes drop down screen). Requested Work Scheduling PLA In 98% of the cases, “Cause” was found to be Strategy Failure Codes B JO Adjustments Creation 20% of Total Work F.R.A.C.A.S. Kaizen Loop Job Plan Daily Coordination Improvements Work Order ”Inadequate Lubrication”. Now it is time to perform a Root Cause Failure Analysis on this Root Cause History Figure 1 - Risk Scorecard Analysis Analysis Failures Breakdowns Work Performed common thread of failures. (“Cause” as identi- fied on CMMS/EAM Codes drop down screen). Once the hierarchy is established you can find similar failures in one area of an operation or Figure 4 - ProActive Work Flow Model www.uptimemagazine.com 45
  3. 3. The purpose of the Asset Criticality Analysis 32 Bearing Failures Accross the Plant is to identify which equipment has the most Wear Defect for 85% of Bearing Failures serious potential consequences on business 98% of Defects Caused by Lack of Lubrication performance, if it fails. Consequences on the business can include: Cause • Production Throughput or Equipment / Part Defect Common + + = Facility Utilization Wear Lack of Bearing Failure • Cost due to lost or reduced output 85% Lubrication 32 Failures Thread • Environmental Issues 98% • Safety Issues • Other What is the Solution to Defect Elimination? The resulting Equipment Criticality Number is Resolve the Lack of Lubrication Problem used to prioritize resources performing main- tenance work. The Intercept Ranking Model Figure 6 – Reason for Equipment Hierarchy Validated illustrates this process (Figure 7). On the “Y” axis you see the asset criticality is listed from across the total operation. Validation of the After a thorough analysis you will find that most none to high. I like using a scale of 0-1000 equipment hierarchy is required against the or- failures come from a small amount of equip- because all assets are not necessarily equal. ganization’s established equipment hierarchy ment. The question is, “Which equipment?”. Using the Intercept line which is struck down standard. We are looking for “Part” – “Defect” the middle, a planner or scheduler can define – “Cause”. Maintenance personnel may not Asset Criticality Analysis – Everyone says they which job should be planned or scheduled first, have the training or ability to determine the have identified their critical equipment. But, in or at least get close to the best answer, because “Defect” (Predictive Maintenance Technician many cases, equipment criticality could change management has already been involved in de- could identify Defect) and “Cause” can be typi- based on how upset people are about an equip- termining the most critical asset and the equip- cally identified by a maintenance technician, ment problem or because people are confused ment has told you (on the “X” axis) which one maintenance engineer, reliability engineer, or about what consequences associate to failure has the highest defect severity (in the worst predictive maintenance technician. and the probability it will occur if we manage condition). equipment reliability effectively. Lots of people talk about doing “Root Cause Failure Analysis”. With hundreds of hands-on examples, we show you how to do it! Practical Plant Failure Analysis is a three- day, reasonably-priced, practical seminar for engineers and skilled plant personnel. In it we use hundreds of failed shafts, bearings, gears, belts, chains, and corrosion examples. As part of the class, small groups do hands-on analysis of a series of pieces, diagnosing how and why they failed, and how to prevent another. The next public session will be on September 28th –30th in Syracuse, NY. Private plant sessions range from two to five days and can be held at your site. Training from the Reliability Professionals who “wrote the book on practical failure analysis.” For more details about failure analysis or training sessions, contact Dale Gamba at 315-487-4390 or email us at reliable@twcny.rr.com Sachs, Salvaterra & Associates, Inc 6171 Airport Road Syracuse, NY 13209 46 june/july 2010
  4. 4. cept of part-defect-cause, and go through the Criticality Severity Order basic equipment types in your facility such as High 1 1st centrifugal pumps, piston pumps, gearboxes, High High 2 2nd motors, etc.. You will find that a relatively Medium 1 3rd small number of failure codes will cover a lot Medium 2 4th High 3 5th of failure modes in your facility. The failure modes developed during this exercise can later Medium Low 1 6th Medium 3 7th become the basis for the failure modes, effects, Intercept Ranking Low 2 8th and criticality analysis that takes place during Criticality High 4 9th Reliability-Centered Maintenance (RCM) proj- Low Low 3 10th ects. In our book, we focus on failure mode Medium 4 11th identification as an output of FRACAS (Failure Low 4 12th Reporting, Analysis and Corrective Action Sys- Not Not 1 13th tem), which, again, is a strong continuous im- Not 2 14th provement process. Not 3 15th High 5 16th If, over a period of one year, the dominant ©2009 GPAllied Medium 5 17th failure mode is “wear” for bearings caused by 5 4 3 2 1 Not 4 18th Defect Severity Low 5 19th Inadequate Lubrication then one can change or Not 5 20th develop a standard, provide training and thus eliminate a large amount of failures. Figure 7 – Intercept Model The problem is that most companies do not The only other two factors I would add in de- In addition, many program tasks are designed have the data to identify a major problem on termining which job to plan or schedule would to maintain regulatory compliance. Many com- multiple assets (No data in equals no effective be based on work order type (PM, CM, CBM, panies have PM programs. However, many of failure reports out). For example, it isn’t the Rebuild, etc) plus time on back. Figure 8 shows the tasks in them do not address specific fail- motor that fails; the motor fails because of a the 4-Way Prioritization Model for planning ure modes. specific part’s failure mode, which then results and scheduling. in catastrophic damage to the motor. Unless, For example: An electric motor with roller of course, the defect is identified early enough Identify what equipment is most likely to nega- bearings has specific failure modes which can in the failure mode. tively impact business performance because it be prevented with lubrication. The failure both matters a lot when it fails and it fails too mode is “wear” caused by “Inadequate Lubri- Maintenance Strategy – The maintenance strat- often. The resulting Relative Risk Number is cation”. The next question may be why you egy should be a result from either a Failure used to identify assets that are candidates for had Inadequate Lubrication. The Inadequate Modes and Effect Analysis, Reliability Centered reliability improvement. Lubrication could be identified as a result of Maintenance or from failure data collected no lubrication standard being established for from your CMMS/EAM. A consistent definition for equipment critical- bearings. In other words someone gives the Elimination Strategy: The best way to ity needs to be adopted and validated in order bearing “x” shots of grease even though no one eradicate this deadly waste is get a better to ensure the right work is completed at the knows the exact amount to prevent the bear- understanding of the true nature of the right time. This is the key to the elimination ing from failure. equipment’s failure patterns and adjust the of failures. Maintenance Strategy to match. The best way to identify failure modes is to use - Andy Page CMRP Identification of Failure Modes – The goal of a facilitated process. Put together a small team most maintenance strategies is to prevent or consisting of people knowledgeable about the So what is a maintenance strategy? Let’s break predict equipment failures. Equipment failures equipment, train them thoroughly on the con- down the two words: Maintenance is to keep are typically caused by the catastrophic failure of an individual part. These parts develop de- Asset Criticality Defect Severity Time On Backlog Work Order Type fects, and when left alone, those defects lead 500 — Highest Criticality 5 — Priority 1 (Most Severe) 4 — Greater than 120 Days 10 — Emergency to the ultimate catastrophic failure of the part. 4 — Priority 2 3 — Greater than 90 Days 9 — Quality Compliance The defects are, in turn, caused by “something”. 3 — Priority 3 2 — Greater than 60 Days 8 — Results of PdM Inspection Eliminating that “something” (the cause) will 1 — Lowest Criticality 2 — Priority 4 1 — Less than 60 Days 7 — Preventive Maintenance Inspections eliminate the failure. 1 — Priority 5 (Least Severe) 6 — Working Conditions/Safety The primary goal of an effective Preventive (PM) 5 — Planned Work Outage program is to eliminate the cause and prevent 4 — Normal Maintenance the failure from occurring. The primary goal 3 — Projects & Experiments of a Predictive Maintenance (PdM) or Condition 2 — Cost Reductions Based Monitoring (CBM) Program is to detect 1 — Spares Equipment the defects and manage the potential failures ©2009 GPAllied before they become catastrophic failures. Figure 8 – “4 Way Prioritization Model” www.uptimemagazine.com 47
  5. 5. pect about Asset Health. Once this change is Asset Health Summary underway, Asset Health, as a metric, becomes From: Jun 2006 To: June 2007 View: Overall what most maintenance managers and plant managers have wanted for a long time — a 1,400 leading indicator of maintenance costs and Plot Legend Asset Count Green business risk. Yellow Red 0 2. Mean Time Between Failures and Mean Time Between Repairs – reported by maintenance Jun 2006 Jul 2006 Aug 2006 Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mar 2007 Apr 2007 May 2007 Jun 2007 or reliability engineers on a monthly basis on Month the top 5-20% of critical equipment. The report to management should include recommenda- Figure 9 – Percent of Assets with No Identifiable Defect tions to improve both metrics and should be measured and posted on a line graph for all to in an existing condition, or to keep, preserve, Strategy, Maintenance Planning consisting of see. protect, while Strategy is development of a repeatable procedures, work scheduled based prescriptive plan toward a specific goal. on equipment criticality, work executed using 3. Cost Variance by area of the plant – report- precision techniques, proper commissioning of ed by maintenance and production supervisor So, a Maintenance Strategy is a prescriptive equipment when a new part or equipment is area of responsibility. Cost variance must be plan to keep, preserve, or protect an asset or installed, and quality control using Predictive reported to maintenance and production man- assets. Keep in mind that one specific type of Maintenance Technologies to ensure no de- agement on a monthly basis. The report should maintenance strategy is “run to failure” (RTF). fects are present after this event occurs. The not be acceptable without a known cause of However, RTF is used only if, based on thor- very last part of your maintenance strategy is the variance and a plan to bring it in compli- ough analysis, it is identified as the best solu- FRACAS, because it drives the continuous im- ance. tion for specific equipment to optimize reliabil- provement portion of this strategy. ity at optimal cost. Less invasive maintenance 4. Most Frequent Part-Defect-Cause Report – is preferred to more invasive maintenance. Failure Reporting reported monthly by maintenance or reliability This is one of the fundamental concepts of any engineers. If you do not have maintenance or well-defined maintenance strategy. Specific Failure reporting can come in many forms. reliability engineers, you may need to appoint maintenance strategies are designed to miti- The key is to have a disciplined plan to review a couple of your best maintenance technicians gate the consequences of each failure mode. failure reports over a specific time period, and as “Reliability Engineering” Technicians, even if As a result, maintenance is viewed as a reliabil- then to develop actions to eliminate failure. unofficially, and train them to be a key player ity function instead of a repair function. Saying Following are a few Failure Report examples, in this failure elimination process. This one re- this means Predictive Maintenance or Condi- which should be included as part of your FRA- port can identify common failure threads with- tion Monitoring is the best solution because it CAS Continuous Improvement and Defect Elim- in your operation which, when resolved, can is mainly noninvasive. ination Process. make a quick impact to failure elimination. Knowing that both systemic problems and op- 1. Asset Health or Percent of Assets with No There are many more reports that can be used erating envelope problems produce the same Identifiable Defect – reported by maintenance effectively, but will not fit in the space of this type of defects, a maintenance strategy that management to plant and production manage- article. You will be able to find more reports merely attempts to discover the defects and ment on a monthly basis at least (see Figure 9). in the book on “FRACAS” written by Ricky and correct them will never be able to reach a pro- An asset that has an identifiable defect is said Bill, which will be published by mid July. active state. Technicians will be too busy fixing to be in a condition RED. An asset that does the symptoms of problems instead of address- not have an identifiable defect is said to be in Bill Keeter is currently a Senior Techni- ing the root cause. To reach a truly proactive condition GREEN. That is it. It is that simple. cal Advisor for Allied Reliability. Bill joined state, the root cause of the defects will need There are no other “but ifs”, “what ifs” or “if Allied in 2006 after serving as President to be identified and eliminated. Maintenance then”. If there is an identifiable defect the asset of BK Reliability Engineers, Inc. where he strategies that accomplish this are able to is in condition RED. If there is no identifiable provided training and facilitation services achieve a step change in performance and defect, it is GREEN. The percentage of ma- to help facilities improve asset performance achieve incredible cost savings. Maintenance chines that are in condition GREEN is the Asset using Weibull Analysis, Reliability Centered strategies that do not attempt to address the Health (as a percentage) for that plant or area. Maintenance, Availability Simulation, and Life root cause of defects will continue to see lack- Cycle Cost Analysis. Bill has over 30 years of luster results and struggle with financial per- The definition for defect is: an abnormality in a experience in Maintenance Engineering and formance. part which leads to equipment or asset failure Management. He has successfully imple- if not corrected in time. mented maintenance improvement programs A Maintenance Strategy involves all elements in a variety of manufacturing and production that aim the prescriptive plan toward a com- Example: the plant has 1,000 pieces of equip- facilities. Bill’s experience includes mainte- mon goal. Key parts of a maintenance strategy ment. Of that number, 750 of them have no nance leadership positions in the US Military, include Preventive and Predictive Maintenance identifiable defects. The plant is said to have the nuclear industry, chemicals, paper con- based on a solid Failure Mode Elimination 75% Asset Health. There is an interesting as- verting, and plastic film manufacturing. He 48 june/july 2010
  6. 6. has provided training and reliability consult- ing services to petroleum, process, mining, and defense industries in the United States, Get more done Mid-East, and Europe. Bill has developed competency maps for Reliability, Availabil- ity, and Maintainability Engineering for the Petroleum Industry’s PetroSkills® program. Bill has published articles in a variety of internationally recognized maintenance pub- lications, and has presented papers on the practical application of Weibull Analysis at several internationally attended Maintenance and Reliability Conferences. Bill is a Certified Maintenance and Reliability Professional with the Society for Maintenance and Reliability Professionals Certifying Organization. You can contact Bill at bkeeter@gpallied.com Ricky Smith is currently a Senior Technical Advisor with Allied Reliability. Ricky has over 30 years experience in maintenance as a maintenance manager, maintenance super- visor, maintenance engineer, maintenance training specialist, maintenance consultant and is a well known published author. Ricky has worked with maintenance organizations in hundreds of facilities, industrial plants, etc, world wide in developing reliability, main- tenance and technical training strategies. Rugged, reliable, Prior to joining Allied Reliability in 2008, affordable Ricky worked as a professional maintenance employee for Exxon Company USA, Alumax Available for the first (this plant was rated the best in the world for time ever—an affordable over 18 years), Kendall Company, and Hercu- imager that will help you les Chemical providing the foundation for his reliability and maintenance experience. New find problems fast. In these tough times, helping you Ti32 Thermal Imager get more done is worth its Ricky is the co-author of “Rules of Thumb Amazing 320 x 240 clarity weight in gold. for Maintenance and Reliability Engineers”, at an affordable price! “Lean Maintenance” and “Industrial Repair, See the only rugged Best Maintenance Repair Practices”. Ricky thermal imager at has also written for several magazines during www.fluke.com/rugged the past 20 years on technical, reliability and then call 1-800-760-4523 maintenance subjects. Ricky holds certifica- to schedule a demo. tion as Certified Maintenance and Reliability Professional from the Society for Mainte- nance and Reliability Professionals as well as a Certified Plant Maintenance Manager from the Association of Facilities Engineering Ricky lives in Charleston, SC with his wife. Aside form spending time with his 3 children and 3 grandchildren, Ricky enjoys kayaking, fishing, hiking and archaeology. If you would like to be notified before the release of the new book, or would like to contact Ricky with questions, send him an email at rsmith@gpallied.com. www.uptimemagazine.com ©2010 Fluke Corporation. 3789036A