Evolution of MaintenanceAt the very beginning, Maintenance was an appendix to Operations / Production: It existed only to fix failures, when they happened. These were the days of absolute Corrective Maintenance
Evolution of MaintenanceAs times went by, it was detected that many failures have an almost regular pattern, failing after anaverage period. Therefore, one could choose regular intervals to fix the equipment BEFORE the failure: Preventive Maintenance Also know as Time Based Maintenance.
Evolution of Maintenance However, very often these failures happen in irregular periods. To avoid an unwanted failure, the periods of Preventive Maintenance are shortened. If equipmentconditions were known, the maintenance could be later. Technology development enabled to identify failure symptoms: Predictive Maintenance Also know as Condition Based Maintenance.
Evolution of Maintenance Many pieces of equipment have sporadic activity (alarms,stand-by equipments, etc.). However, we must be sure thatthey are ready to run. These are "hidden faults“. Detect and prevent hidden failure is called: Detective Maintenance
Evolution of MaintenanceThe different failure modes mean that there’s notone only approach, about Corrective, Preventive orPredictive Maintenance Programs.The correct balance will give in return betterequipment reliability, thus the name:Reliability Centered Maintenance Take it easy, Remember, my grandma, not kid, Prevention always! is better than Cure....
Reliability Centered Maintenance (RCM)John Moubray 1949-2004After graduating as a mechanical engineer in 1971, John Moubray workedfor two years as a maintenance planner in a packaging plant and for oneyear as a commercial field engineer for a major oil company.In 1974, he joined a large multi-disciplinary management consultingcompany. He worked for this company for twelve years, specializing in thedevelopment and implementation of manual and computerizedmaintenance management systems for a wide variety of clients in themining, manufacturing and electric utility sectors.He began working on RCM in 1981, and since 1986 wasfull time dedicated to RCM, founding Aladon LCC, whichhe led until his premature death in 2004.John Moubray is today considered a synonym of RCM.
Reliability Centered Maintenance (RCM) Its originsWhat about a failure rate of 0.00006/event?Quite good, no?This was the average failure rate in commercial flightstakeoffs, in the 50’s. Two thirds of them caused byequipment failures.Today, this would mean 2 accidents per day, withplanes with more than 100 passengers!!!That’s why Reliability Centered Maintenance has begunin the Aeronautical Engineering. Pretty soon, Nuclearactivities, Military, Oil & Gas industries also began touse RCM concepts and implement them in theirfacilities.
Reliability Centered Maintenance (RCM) Reliability and Availability Reliability Reliability is a broad term that focuses on the ability of a product to perform its intended function. Mathematically speaking, reliability can be defined as the probability that an item will continue to perform its intended function without failure for a specified period of time under stated conditions. Reliability is a performance expectation. It’s usually defined at design. Availability Depends upon Operation uptime and Operating cycle. Availability is a performance result. Equipment history will tell us the availability.Bibliography: Kardec, Alan y Nascif, Julio - Manutenção- Função Estratégica, Editora Qualitymark
Reliability Centered Maintenance (RCM) Reliability and Availability MTBF = Mean Time Between Failures MTTR = Mean Time To Repair A first definition: MTBFAvailability = MTBF + MTTRBibliography: Kardec, Alan y Nascif, Julio - Manutenção- Função Estratégica, Editora Qualitymark
Reliability Centered Maintenance (RCM) Availability definitions MTBF = Mean Time Between Failures MTTR = Mean Time To Repair MTBM = Mean Time Between Maintenance actions M = Maintenance Mean Downtime (including preventive and planned corrective downtime) Inherent Availability: consider only corrective downtime Achieved Availability: consider corrective and preventive maintenance Operational Availability: ratio of the system uptime and total time MTBF Inherent Availability = MTBF + MTTR MTBM Achieved Availability = MTBM + M UptimeOperational Availability = Operation Cycle
Reliability Centered Maintenance (RCM) Reliability and Availability 250 days 360 days 200 days 120 days = 947 days Downtime 9d 6 2 MTBF = (250 + 360 + 200 + 120) / 4 = 232.5 days MTTR = (9 + 6 + 2) / 3 = 5.67 days Availability = 232.5 / (232.5 + 5.67) = 97.62 % 180 days 400 days 120 days 233 days = 947 daysDowntime 7 4 3 MTBF = (180 + 400 + 120 + 233) / 4 = 233.25 days MTTR = (7 + 4 + 3) / 3 = 4.67 days Availability = 233.25 / (233.25 + 4.67) = 98.04 %
Reliability Centered Maintenance (RCM) Reliability and Availability Achieved Availability↑ = MTBM↑/ (MTBM+M↓)To improve Availability:Improve MTBM:•Reduce Preventive Programs to a minimum, or, have Preventive intervals as welldefined as possible.•Using Predictive techniques whenever possible•Implementing Maintenance Engineering (RCM, TPM...)Minimize M:•Implementing Maintenance Engineering (Planning, Logistics...)•Improving personnel technical skills (training)•Developing Integrated Planning (Mntce+Ops+HSE+Inspection+...) Bibliography: Kardec, Alan y Nascif, Julio - Manutenção- Função Estratégica, Editora Qualitymark
Reliability Centered Maintenance (RCM) Improving ProductivityProductivity Improvement Factors: Detailed work planning Delivering equipments to Maintenance as clean as possible Check-list at the end of Maintenance activities Complete and comprehensive Equipment data available Supplies available on job site Skilled personnel Bibliography: Kardec, Alan y Nascif, Julio - Manutenção- Função Estratégica, Editora Qualitymark
Reliability Centered Maintenance (RCM) Translating percents to daily routine... Availability % Downtime per year Downtime per month* Downtime per week90% 36.5 days 72 hours 16.8 hours95% 18.25 days 36 hours 8.4 hours98% 7.30 days 14.4 hours 3.36 hours99% 3.65 days 7.20 hours 1.68 hours99.5% 1.83 days 3.60 hours 50.4 min99.8% 17.52 hours 86.23 min 20.16 min99.9% ("three nines") 8.76 hours 43.2 min 10.1 min99.95% 4.38 hours 21.56 min 5.04 min99.99% ("four nines") 52.6 min 4.32 min 1.01 min99.999% ("five nines") 5.26 min 25.9 s 6.05 s99.9999% ("six nines") 31.5 s 2.59 s 0.605 s
Reliability Centered Maintenance (RCM) Maintenance Programs costs Maintenance Program Cost US$/HP/yearCorrective (unplanned) 17 to 18Preventive 11 to 13Predictive / Planned Corrective 7 to 9 NMW Chicago
Reliability Centered Maintenance (RCM) DefinitionsFailure rate (λ)Failure rate (λ) is defined as the reciprocal of MTBF: 1 λ (t ) = MTBFReliability: R(t)Let P(t) be the probability of failure between 0 and t; reliability is defined as: R(t) = 1 – P(t)Bibliography: Lafraia, João Ricardo - Manual de Confiabilidade, Mantenabilidade e Disponibilidade, Editora Qualitymark
Reliability Centered Maintenance (RCM) Some math...Considering rate failure (λ) constant, it is proven (check at www.weibull.com),that R(t), meaning the probability of having operated until instant t, is given by: − λt R (t ) = eThis reinforces the idea that Reliability is function of time, it isn’t a definitenumber. So, it’s incorrect to affirm: “This equipment has a 0.97 reliabilityfactor...”. We should rather say: “This equipment has 97% reliability forrunning, let’s say, 240 days...”
Reliability Centered Maintenance (RCM) Tricks and tips...Historically, an equipment has 4 failures per year. Which is thereliability of this equipment for a 100 days run?λ =4/365 λ =0.011/day R(100) = e-0.011x100 = e-1.1 = 0.333 = 33.3% The probability of having no failure until 100 days is 33.3%Some upgrades have been made, so failure rate now is 2 per year(meaning that MTBF has doubled). Which is the reliability for a 100days run?λ =2/365 λ =0.0055/day R(100) = e-0.0055x100 = e-0.55 = 0.577 = 57.7% The probability of having no failure until 100 days is 57.7%. As seen, doubling MTBF doesn’t double reliability.
Reliability Centered Maintenance (RCM) Trick and tips...Historically, an equipment has a MTBF = 200 days. To improve10% its reliability to operate on a 100 days run, which percentshould MTBF be improved?λ =1/200 λ =0.005/day R(100) =e-0.005x100 = e-0.5 = 0.607 = 60.7%To improve this reliability in 10%, new reliability should be: R’(100) = 1.1 x 0.607 = 0.668 = e-λ’x100 Ln 0.668 = -λ’ x 100 -0.403 = -λ’ x 100 λ’= 0.00403 1/MTBF’ = 0.0043 MTBF’ = 232 days 232/200 = 1.16 MTBF should improve 16%
Reliability Centered Maintenance (RCM) Trick and tips... As per the manufacturer, an equipment has a 90% reliability to run over one year. If you want to have a 95% confidence that it will not fail, how long should it take until the equipment undergo a Preventive maintenance or some predictive technique? 0.9 = e-λx365 ln 0.9 = -λ x 365 -0.1054 = -λ x 365 λ = 2.89 x 10-4/day 0.95 = e-λt ln 0.95 = -λt -0.0513 = - 2.89 x 10-4 x t t = 177.5 days For practical purposes, this equipment could be in a semester preventive / predictive program.
Reliability Centered Maintenance (RCM) System in series 1 2 3 Let P1=5%, P2=10% and P3=20% be the failure probability of each component of this system, in a certain period. Which is the reliability of this system, in series? This system will run, provided that ALL its components run. So, their reliabilities are multiplied. R1 = 1 – P1 = 1 – 0.05 = 0.95 R2 = 1 – P2 = 1 – 0.10 = 0.90 R3 = 1 – P3 = 1 – 0.20 = 0.80 R = R1 x R2 x R3 = 0.95 x 0.90 x 0.80 = 0.6840 = 68.4% System failure probability 31.6% System failure probability is bigger than each individual component. System reliability is less than each component.Bibliography: Lafraia, João Ricardo - Manual de Confiabilidade, Mantenabilidade e Disponibilidade, Editora Qualitymark
Reliability Centered Maintenance (RCM) System in parallel 1 2 3 Let P1=5%, P2=10% and P3=20% be the failure probability of each component of this system, in parallel, in a given period. Which is the reliability of the system, in parallel? This system will run until ALL components fail. In this case, the failure probabilities are multiplied. P = P1 x P2 x P3 = 0.05 x 0.10 x 0.20 = 0.0010 R = 1 – P = 0.999 = 99.9% System failure probability 0.1% System failure probability is less than each component. System reliability is bigger than each component.Bibliography: Lafraia, João Ricardo - Manual de Confiabilidade, Mantenabilidade e Disponibilidade, Editora Qualitymark
Reliability Centered Maintenance (RCM) Mixed systems 1 2 3 4 5If P1=10%, P2=5%, P3=15%, P4=2% and P5=20%, which is the system reliability? 123 R1= 1 – 0.10 = 0.90 R2= 1 – 0.05 = 0.95 R123 = 0.9 x 0.95 x 0.85 = 0.7268 P 123= 0.2733 45 R3= 1 - 0.15 = 0.85 R4= 1 – 0.02 = 0.98 R45 = 0.98 x 0.80 = 0.7840 P45= 0.2160 R5= 1 – 0.20 = 0.80 P123= 0.2733 Psystem = 0.2733 x 0.2160 = 0.0590 System P45= 0.2160 Rsystem = 1 – 0.0590 = 0.941 = 94.1%
Reliability Centered Maintenance (RCM) Redundancy A The pumps A, B y C are feed pumps of a plant. To operate in full condition, it’s necessary that at least B two of these three pumps are running. Failure probability of each one is 10%. Which is the reliability to run this plant at full production? CFailure probability is P= 0.1 (10%), and reliability is R=1-0.1= 0.9 (90%)Three pumps in parallel, so:(R + P)3 = R3 + 3R2P + 3RP2 + P3= 0.93 + 3x0.92x0.1 + 3x0.9x0.12 + 0.13(R + P)3 = 0.729 + 0.243 + 0.027 + 0.001Three running: 0.729Two running and one off: 0.243 Reliability = 0.972 = 97.2 %One running and two off: 0.027None running: 0.001 No full production = 0.028 = 2.8 %
Reliability Centered Maintenance (RCM) Redundancy A The pumps A, B y C are feed pumps of a plant. Pump A flow rate is 2,000 gpm, pump B flow rate is B 1,800 gpm and pump C flow rate is 1,700 gpm. To operate, the plant need at least a feed rate of 3,600 gpm. Reliabilities are: RA=0.95, RB=0.90 and C RC=0.85. Which is the plant reliability?As the plant needs at least 3,600 gpm, to supply this, there will be these cases:A∩B∩C 0.95 x 0.90 x 0.85 = 0.72675A ∩ B ∩ notC 0.95 x 0.90 x (1 – 0.85) = 0.12825A ∩ notB ∩ C 0.95 x (1 – 0.90) x 0.85 = 0.08075 Plant reliability = 0.93575 93.6%
Reliability Centered Maintenance (RCM) System and Component Redundancy A B A B A’ B’ A’ B’ Component Redundancy System Redundancy Which of these systems would have a better overall reliability (let’s assume all components have the same reliability R)?AA’ and BB’ subsystems’ reliability: AB and A’B’ subsystems’ reliability:1 - (1-R)2 =1 – 1 + 2R – R2 = 2R – R2 R2System reliability: System reliability:R component redundancy = (2R-R2)2 R system redundancy = 1 – (1-R2)2 R system redundancy = 1 – 1 + 2R2-R4 R system redundancy = 2R2 - R4 R comp red - R syst red = (2R-R2)2 - (2R2 - R4) = 4R2 – 4R3 + R4 - 2R2 + R4 R comp red - R syst red = 2R4 – 4R3 + 2R2 = 2R2(R2 – 2R + 1) = 2R2(R-1)2≥ 0 R comp red ≥ R syst red
Reliability Centered Maintenance (RCM) Active and Passive Redundancy A BActive Redundancy: Passive Redundancy:Both equipment are One equipment isoperating at the same operating, and the othertime, sharing the load. one is at stand-by,If one fails, the other starting operating afterone will carry the load the failure of the firstalone. one, pending upon a switch system.
Reliability Centered Maintenance (RCM) Getting closer to real world...In systems with active redundancy all redundant components are in operation and are sharing the load with the main component. Upon failure of one component, the surviving components carry the load,and as a result, the failure rate of the surviving components may be increased. The reliability of an active, shared load, parallel system can be calculated as follows:where: λ1 is the failure rate for each unit when both are working and λ2 is the failure rate of the surviving unit when the other one has failed. If 2λ1 = λ2, then:
Reliability Centered Maintenance (RCM) Getting closer to real world...In a system with active redundancy, reliability of each of the two components for 100 days is R=0.96, when sharing the load. If one compontents fails, the surviving one will have a 50% increase in its failure rate. Which is it the system reliability for 100 days? R(100) = 0.96 = e-λx100 ln 0.96 = -100λ λ1 = 0.00041 λ2 = 1.5 x λ1 = 0.000615 2 × 0.00041 R (100) = e − 2×0.00041x100 + × e ( − 0.000615 100 × − e −2×0.00041×100 ) 2 × 0.00041 − 0.000615 ( ) R (100) = e −0.082 + 4 × e −0.0615 − e −0.082 R (100) = 0.9213 + 4 × (0.9404 − 0.9213) R (100) = 0.9977 If there were no increase in failure rate, system reliability would be 0.9984. Look like nothing, but this means a 30.5% decrease in system MTBF!!!
Reliability Centered Maintenance (RCM) Getting closer to real world...The redundant or back-up components in passive or standby systems startoperating only when one or more fail. The back-up components remain dormantuntil needed.For two identical components (primary and back-up) the formula is: R(t) = e-λt (1+λt), considering a perfect switchIf the reliability of the switch is less than one, the reliability of the system isaffected by the switching mechanism and is reduced accordingly: R(t) = e-λt (1+Rswλt), Rsw switch reliabilityThe reliability of a standby system consisting of one primary component withconstant failure rate λ1 and a backup component with constant failure rate λ2 isgiven by:
Reliability Centered Maintenance (RCM) Getting closer to real world... Two feed pumps in a nuclear power plant are connected in a stand-by mode. One is active and one is on standby. The power plant will have to shut down if both feed pumps fail. If the time between failures of each pump has an exponential distribution with MTBF = 28,000 hours, and the failure rate of the switching mechanism λsw is 10-6 what is the probability that the power plant will not have to shut down due to a pump failure in 10,000 hours? R(t) = e-λt (1+Rswλt)R(t) = e-λt (1+Rswλt), 10−6 ×104 10−2Switch reliability: Rsw = e =e = e −0.01 = 0.9900λ = 1/MTBF −1 ×10000 1 R (10000) = e 28000 × (1 + 0.9900 × ×10000) 28000 R (10000) = e −0.3571 × (1 + 0.3536) R (10000) = 0.6997 ×1.3536 R (10000) = 0.9471
Reliability Centered Maintenance (RCM) Bathtub CurveEarly Life (Burn-in, infant mortality)• large number of new component failures which decreases with timeUseful Life• small number of apparently random failures during working life(λ constant)Wear-out• increasing number of failures with time as components wear out
Reliability Centered Maintenance (RCM) Bathtub CurveEarly Life:• sub-standard materials• often caused by poor / variable manufacturing and poorquality control• prevented by effective quality control, burn-in, and run-in, de-bugging techniques• weak components eventually replaced by good ones• probabilistic treatment less importantUseful Life:• random or chance failures• may be caused by unpredictable sudden stressaccumulations outside and inside of the components beyondthe design strength• over sufficiently long periods frequency of occurrence (λ) isapproximately constant• failure rate used extensively in Safety & Reliability analysesWear-out period:• symptom of component ageing• prediction is important for replacement and maintenancepolicy
Reliability Centered Maintenance (RCM) Different bathtub curves These statistics are from aeronautical industry. In a process plant, like a refinery, do you think the percent of each one would be about the same?
Reliability Centered Maintenance (RCM) Different bathtub curves Which of these curves would be applicable to: A pump? An electronic instrument? A tire?
Reliability Centered Maintenance (RCM) Failure modesCommon sense tells that the best way to optimize the availability of plants is toimplement some Preventive maintenance.Preventive maintenance means fixing or replacing some pieces of equipments and/orcomponents in fixed intervals. Useful lifespan of equipments may be calculated withFailure Statistical Analysis, enabling Maintenance Department to implement PreventivePrograms.This is true for some simple pieces of equipment and components, which may have aprevailing failure mode. Many components in contact with process fluids have a regularlifespan, as well as cyclic equipment, due to fatigue and corrosion.But, for many pieces of equipment there’s no connection between reliability and time.Furthermore, as seen in Reliability curves, defining the optimum interval for Preventivemaintenance may be a hard task. Besides, fixing or even replacing the equipment maybring you back to Infant Mortality period...
Reliability Centered Maintenance (RCM) Preventive maintenance may cause failures earlier.... Failures are likely to happen… Here begins wear-out period. Let’s define Preventive maintenance here…λ Time The failure likelihood is earlier!!!!
Reliability Centered Maintenance (RCM) TurnaroundsTurnarounds are often seen by Operations as an unique opportunity to have all problems solved, all equipment fixed…Meanwhile, for Maintenance, a Turnaround is a huge event, time & resources & costs consuming, in which ONLY should be done whatever CANNOT be done on the run, during normal operation.Frequently, Maintenance is asked to perform General Maintenance in ALL rotating equipment of a Unit, during its Turnaround. Matter of fact, if these equipment have spares, this General Maintenance should be done out of the TAR.Why do Operations want everything to be done during the TAR?1) Because Ops don’t have enough confidence that it will be done during routine maintenance.2) Because they don’t feel comfortable running with an equipment momentarily without spare… the same way when we have a flat tire, we just drive with the spare tire enough to hit the tire repair shop…
Reliability Centered Maintenance (RCM) Turnarounds 1) Ops don’t have enough confidence that it will be done during routine maintenance. To improve TAR results, reversing the vicious cycle below, Maintenance management has to improve Routine Maintenance! To much to be done Not in excess during TAR equipments to be done during TAR TAR won’t be Many able to TAR will carry Good routineequipments perform all out all services maintenance left to TAR that has to be needed done Many equipments Unit running left to well Routine Maintenance
Reliability Centered Maintenance (RCM) Turnarounds2) Because they don’t feel comfortable running with an equipment momentarily without spare… the same way when we have a flat tire, we just drive with the spare tire enough to hit the tire repair shop… Consider these two pumps in a Passive Redundancy (one will be as stand-by). Assume that during the first 100 h after a General Maintenance such a pump will have a 70% reliability, and after this, for an one year period, it would run with 97% reliability (which are reasonable assumptions!!!).If General Maintenance is performed in a Preventive or Predictive Program, duringnormal operations, during repair time the unit will be running pending upon a uniquepump, with a 97% reliability.If during TAR both pumps will be under General Maintenance, during the first 100hours the system reliability (considering a perfect switch) would be 94.5% (using theR(t) = e-λt(1+λt) formula) . So, the unit would run for a period of time with twoavailable pumps, but with an overall reliability below if it would be running with onlyone pump!
Reliability Centered Maintenance (RCM) RCM Implementation Flowchart Will the failure affect Nodirectly Health, Safety or Environment? Will the Failure affect Yes adversely the Mission, Vision No and Core Values of the Company? Yes Will the failure cause Yes major economic losses? (harm to systems and / or Is there some Cost- machines)? No effective Monitoring Technology available? No Yes Are there regular failure Deploy Monitoring No patterns (time techniques intervals)? YesPredictive Maintenance Preventive Re-design the system, Run-to-fail? Maintenance accept failure risk, or install redundancy
Reliability Centered Maintenance (RCM) Another RCM Implementation Flowchart If this thing breaks will it If this thing breaks will it If this thing breaks will it No Yes No be noticed? hurt someone or the slow or stop production? environment? No Yes Yes Can preventing it break Can preventing it break Is it cheaper to prevent it Is it cheaper to prevent reduce the likelihood of reduce the reduce the breaking than the loss of it breaking than to fix it? multiple failures? risk to the environment production? and safety? Yes No Yes No Yes No Yes NoPrevent it Check to see Prevent it Re-design it Prevent it Let it break Prevent it Let it breakbreaking if it is broken breaking breaking breaking