TRAITS FOUND IN EFFECTIVE RELIABILITY PROGRAMS         Fred Schenkelberg         Reliability Engineering Consultant       ...
In 1996, we were unable to tally the corporate warranty expense, the systems and metrics established in the late ‘80shad b...
A fully stated goal, often with multiple duration and associated probability statements (out of box, first 90 days,warrant...
optimization. Teams that do this well seek the areas for the best return for the investment, whether that is component cos...
Product teams that regularly produce reliable products (the upstairs team) have these three traits in common.             ...
Upcoming SlideShare
Loading in …5
×

Traits Found in Effective Reliability Programs

299 views

Published on

Paper given at the 2009 ASQ World Quality Congress on key features found is the best (and worst) reliability programs.


Having the privilege to interview a cross-section of more than 70 product development teams to understand their reliability program has led to a few observations. Only a rare few have mature, cost effective and efficient reliability programs.

A clear understanding of your organization’s reliability program along with a clear vision of what is possible is the crucial first step to making systematic program improvements. This paper explores the key traits which separate good from great reliability programs.

Marketing, product volume, complexity and organizational structure do not tend to matter however a proactive approach, statistical thinking, fact based decision making and integrated reliability tools do tend to make a difference. This paper outlines how to assess your organization and provides highlights of key traits of good and simply great reliability programs.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
299
On SlideShare
0
From Embeds
0
Number of Embeds
24
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Traits Found in Effective Reliability Programs

  1. 1. TRAITS FOUND IN EFFECTIVE RELIABILITY PROGRAMS Fred Schenkelberg Reliability Engineering Consultant FMS Reliability Los Gatos, CA 95032 fms@fmsreliability.com www.fmsreliability.comSUMMARY Having the privilege to interview a cross-section of more than 70 product development teams to understand theirreliability program has led to a few observations. Only a rare few have mature, cost effective and efficient reliabilityprograms. A clear understanding of your organization’s reliability program along with a clear vision of what is possible is thecrucial first step to making systematic program improvements. This paper explores the key traits which separate good fromgreat reliability programs. Marketing, product volume, complexity and organizational structure do not tend to matter however a proactiveapproach, statistical thinking, fact based decision making and integrated reliability tools do tend to make a difference. Thispaper outlines how to assess your organization and provides highlights of key traits of good and simply great reliabilityprograms.INTRODUCTION On one occasion I conducted assessments of two organizations located in the same building. Both designed andmanufactured telecommunication equipment with similar complexity and volume. The interview schedule had me going upand down stairs almost every hour for two days and by midday of the first day I enjoyed going upstairs and dreaded headingdown. Despite all the similarities the two reliability programs were dramatically different; as different as their reliabilityresults. Downstairs the interviews started late, got interrupted by urgent phone calls or in-person requests; Firefighting at itsbest. The team employed a wide range of tools, all that were listed on a checklist, for each project. The reliability goals werenot known to the design team and for the few that knew them also understood they would not be measured or impede gettingthe product to market. The people I talked to stated reliability was very important and were very busy fixing field or testing(just before product launch) identified issues. Reliability was done by the guy that left last year. Upstairs the interview started on time, without interruption. No one remembered the last time there was an urgentneed to resolve a field issue. The team employed reliability tools as needed that would benefit the project. The specific testingwas tailored to the risks identified during the design phase. The goals were widely known and current status was also known,during development and after product launch. The people I talked to stated reliability was very important and they knew whatto do to meet their reliability objectives. Reliability thinking and skills were taught by Sharon who left last year. This paper touches on the key traits that separated these two groups.BACKGROUND In 1983, John Young, CEO of Hewlett-Packard Company, noticed that the rate of growth of warranty was higherthan the rate of growth of revenue. He asked the corporation to reduce warranty by 10x by the end of the decade. One of thekey factors in the success of this program in changing warranty from 4% to 1.5% of net revenue was the identification andencouragement of key reliability engineering practices. [Ireson, 5.1] Dick Moss conducted the survey and was my mentor atHP.
  2. 2. In 1996, we were unable to tally the corporate warranty expense, the systems and metrics established in the late ‘80shad been dismantled. When the result was finally determine, the corporation had lost ground and looked like it wouldcontinue to grow the warranty expense faster than the revenue rate. Just like the early ‘80s many of the key reliabilitypractices were widely used, yet the results did not indicate any effectiveness. It was time to conduct another survey since afew product divisions did have better results with respect to warranty expenses. So, I dusted off the old Moss survey. One item became clear as the survey progressed was that the culture of the product team and how they viewedreliability seemed directly related to the results. This is similar to the quality maturity as described in “Quality is Free” byPhilip B. Crosby. Using the same approach for product reliability, the product teams with high maturity did have significantlylower warranty expenses. Other attempts have explored this relationship between reliability activities, effectiveness andresults, including a current effort within IEEE to publish a reliability assessment standard. [Gullo] In my experience, product teams have asked for guidance on how to improve their product reliability (e.g. warrantyexpenses), which is guidance on how to move to the right on the maturity matrix and become more effective in achievingreliable product performance in the field. A few of these engagements involved reliability programs that already employed anassortment of practices, yet each had one or two missing elements that kept them from achieving systemic improvements. Itis specifically these experiences that form the basis for this paper.THE TRAITS There are three main interconnected threads that run through very effective programs. First, teams with clearlystated reliability goals that are routinely estimated, measured and evaluated. Second, teams that make design decisions fullyconsidering the impact to the program and business. And, third, the team actively seeks failures and endeavor to learn asmuch as possible from each failure. Each of these traits consists of a collection of tightly interwoven reliability tools orpractices. The specific tools vary from one team to the next due to volume, market, and other business priorities.Trait 1: STATE CLEAR GOALS There are plenty of really bad reliability goal statements, like 20,000 hour MTBF, 5 year life, ‘as good or betterthan…”, 2 year warranty, zero field failures. What very good programs have is a complete statement that permits theorganization to understand and use the goal to influence each design decision. A simple definition of reliability includes four elements: function, duration, probability and environment. Poor goalstatements are often only one of the four elements and force assumed values for the other elements. A complete reliabilitygoal statement includes all four elements as shown below. “Product FMS provides music storage and playback [key functions] for two years [duration] with 98% reliability[probability of success over duration period] in a worldwide portable environment [environment].” Both the function and environment require further definition and is often done with other key documents orreferences. For example many product development teams have a set of product specifications the design should meet. Theseinclude size, color, features, and performance parameters. Generally the function element includes anything that the customerwould notice not working, and when it didn’t perform as expected, would call a failure. Understanding what is a failure fromthe customer’s point-of-view tailor risk analysis and product evaluations to key elements important to customers. The environment includes shipping, storage, installation, startup, and use. Many organizations develop a set ofdocuments that capture the key features of their market’s environment. Many organizations rely on standards and do nottailor, as the best do, the environmental parameters to reflect the experience of their products with their customers. Forexample, the above MP3 player is likely to be on a car dash board in the sun – does the internal set of environmentalrequirements capture this temperature extreme and expected duration? The better environmental statements include nominaland expected range of values for temperature, humidity, shock, radiated emissions, usage profiles, and possibility numerousother environmental and usage factors that define the most significant parameters that impact the short and long termperformance of the product with the customer. It is not a set of fixed profile tests.
  3. 3. A fully stated goal, often with multiple duration and associated probability statements (out of box, first 90 days,warranty period and expected life are common durations of interest). Different failure mechanisms may exhibit failures witha design at different points of time. For example shock and vibration from transportation to the customer may be the mostsignificant root cause of out of box failures, whereas mechanical fatigue may dominate the failures after the warranty period.The full statement permits consideration of materials, assembly options, component selection and packaging approaches earlyin the product design process. A reliability goal is just one of many constraints a design team must consider during product development. Theyface a seemingly endless list of requirements, regulations, and business expectations. The three most common areperformance, schedule and cost. The performance is the functions i.e. what is the product supposed to do for the customerand this is often key to the value the product provides. It is immediately measurable and either meets the performancerequirements or doesn’t. The first prototypes provide the first measures and are central to nearly every measure made andreported during development and manufacturing. Schedule refers to the time to market requirement. The project has a targetdate to have the product in its final form, on the shelf, ready for sale. The calendar measures this criteria and a series ofschedule milestones remind the design team of the deadline. Cost is often the bill of material cost and relates to theprofitability of the product. A simple spreadsheet listing the components and assembly costs can tally this every day for thedesign team. All three are readily measurable. They each provide feedback to the team. Reliability, specifically the probability of successful operation at later durations is difficult at best to measureaccurately. The second element of this trait is the repeated and improving measure of reliability during the design process.Goals without some method to track progress leaves the team guessing did they achieve the goal or are they on target. Themeasure provides a means to make adjustments, to gauge readiness for the market. One of the best examples I’ve seen involved a weekly report to the design team on reliability. Each Friday, Philwould gather the best available data or estimates for each of the major sub-systems of the product. On Monday he wouldreport the results of the tally against the reliability goals. Early in the program these estimates were based on historical datafrom previously fielded products. As the design evolved the estimates received adjustments from parts count and vendor datasources. For key elements the team invested in accelerated life testing or encouraged the vendor to perform the testing. And,finally with later prototypes, the team conducted accelerated demonstration tests on the entire system using time compressionand elevated temperature. High temperatures accelerated most dominate high risk failure mechanisms and the team closelymonitored the first 6 months of field performance. During each stage of the product lifecycle the team received the best available measure of reliability. As the designprogressed and as the product become more functional, additional testing and estimates continued to improve. Just like theother three major constraints (performance, schedule and cost) reliability measures provided regular feedback. A goal without a measure, like measures without a goal, provide limited value to the decision making process.Clearly stating a fully expressed reliability goal and regularly measuring reliability permit the team to know where they aregoing, if they are on track, and, when they have arrived.Trait 2: ENABLE TRADEOFFS A single key piece of information is all that is required to enable designers to balance reliability with performance,time to market and cost. This information exists within any product shipping company, and is nearly always unknown to thedesign team. Providing the cost of a field return value in dollars permits the designer to translate reliability differences intodollars. For example if the projected shipments are 1000 units a month and a return costs the company $450 (call center,repair/replacement, shipping, failure analysis, are examples of elements of this value) translate into the value of a 1% changein reliability (from 92% to 93%, for example) would reduce the returns cost by $4,500 per month. Taking this example a bitfurther, assume it would cost (bill of material cost) $1/unit more to achieve the change in field failure rate, is this worth theincrease in bill of material cost? Certainly, as the savings is $4,500/1000 units or $4.50 per unit shipped. Adding, $3.50 toprofit for each unit shipped. For high risk areas or major elements of a design, the team may face multiple options to trade off cost, time tomarket or functionality each with associated costs. By understanding the impact to reliability these trade offs can be fullyconsidered. Teams that do this well use it during component selection, during design solution comparisons and during design
  4. 4. optimization. Teams that do this well seek the areas for the best return for the investment, whether that is component cost,functionality, schedule or reliability.Trait 3: SECURE FAILURES “The concept of failure is central to the design process, and it is by thinking in terms of obviating failure thatsuccessful designs are achieved.” [Petroski] Product teams understand the product should just work for the customer. It shouldn’t fail. In my experience designteams tend to imagine possible failure modes and attempt to design the product to avoid or mitigate the failure. It may be apoint of litigation if the product fails in a manner that should have been anticipated by the design team. More often it is thebusiness case that a product that doesn’t fail, will sell better and have lower warranty expenses. Hence, a reliable product ismore profitable. The best teams aggressively seek failures in the design over the entire product lifecycle. In early concept phases,consider the fundamental limits of the chosen technology. Also, consider the types of stresses expected during use and projecteffect onto the core technology. A Failure Mode and Effect Analysis (FMEA) may help reveal high risk areas for furtheranalysis. With the first prototypes, the team now can directly evaluate performance and discover failure mechanisms throughtesting such as Highly Accelerated Life Testing (HALT). And, during the product launch, the team can either confirm ordiscover the way the product fails in use. In all cases, a technical understanding of the interaction of the design with theapplied stress (use, temperature, vibration, etc.) permits the team to uncover the design flaw the revealed itself as a failure. Reliability growth modeling is based on the premise that every design has an unknown and finite number of designflaws. The product development process is the careful uncovering and resolving of as many of these flaws as possible beforeshipping the products. At some point finding the remaining flaws is not worth the effort (cost and time). The remaining flawshave acceptable field reliability. Just finding the failures is a key first step in this trait. Many failures, once revealed to a design team, highlightsvarious design changes that will reduce or eliminate the same failures in the improved design. On some occasions, the failureonly is a symptom and treating the assumed cause of the failure does not remove the flaw. For example, an intermittent overvoltage power supply may cause sensitive integrated circuits (IC) to fail. The IC failure may indicate a faulty component, andit’s replacement does not change the underlying root cause of the failure. It will happen again. Or, the faulty power supplymay cause another component to fail. With careful failure analysis of the broken IC, the root cause of over voltage wouldlead to investigating the power supply. Once the power supply design is fixed, the failure symptom of blown IC’s goes away. Another element of this trait is the pursuit of every failure. Imagine during prototyping 100 units are created anddistributed to various parts of the team for evaluation and testing. Some failures may occur with all 100 units, some failuresoccur with about half and some occur on only one unit. The first two cases are obvious flaws that need attention andresolution before shipping, as the sample failure rates approximates a 100% and 50% field failure rate. Now let’s further assume the product goal is 95% reliability over the first year and that five units revealed a designflaw each, and furthermore, the other 95 units function without fault. The team is done, right? No, first there is an issue withthe sample of 100 units with five failures estimating the population’s failure rate. The nominal estimate is 5/100 or 5%, whichis the same as 95% reliability. We have to assume all 100 units experienced at least of year of operation (very unlikely) orthose other functional units did not replicate the failure when exposed to the stress that uncovered the fault (more likely).Using a 90% confidence that the sample represents the population, the actual reliability could be as low as 63%. Also, consider use conditions, environment, manufacturing and components all vary, the actually failure rate willcertainly be worse than that estimated during development. Therefore, even relatively rare failures in the developmentprocess require careful analysis and resolution. In other words, each and every failure is a gift to learn about design flaws within a product. Using tools like FMEAand HALT permit the team to uncover the faults as soon as possible.CONCLUSION
  5. 5. Product teams that regularly produce reliable products (the upstairs team) have these three traits in common. • First a complete reliability statement with regular measurement. • Second, the ability to translate reliability changes into dollars, • Third, the aggressive discovery and resolution of failures. Each of these is more than using a reliability engineering tool. They are a collection of tools working together toencourage and enable the engineer to develop a product that meets the customer’s expectations of reliability. When all thepieces are in place the opportunity to meet reliability and business goals improves. The results of the upstairs team has beenrepeated by other teams that carefully assessed their development program and adjusted to include all the elements of thethree traits.REFERENCESCrosby, Philip B., Quality if Free: The Art of Making Quality Certain, Mentor, New York, 1979.Gullo, Louis J., et. al., "Assessment of Organizational Reliability Capability", Components and Packaging Technologies,IEEE Transactions, June, 2006, Vol. 29, Issue 2, 425-428.Ireson, W. Grant, Coombs, Clyde F. and Moss, Richard Y., Handbook of Reliability Engineering and Management, 2nd Ed.,McGraw-Hill, New York, 1996.Petroski, Henry, Design Paradigms: Case Histories of Error and Judgment in Engineering, Cambridge University Press, 1994.

×