Successfully reported this slideshow.
Data Acquisition: A Key
Challenge for Quality
and Reliability
Improvement
Gerald J. Hahn & Necip Doganaksoy
©2013 ASQ & Pr...
ASQ Reliability Division
English Webinar Series
One of the monthly webinars
on topics of interest to
reliability engineers...
DATA ACQUISITION: A KEY CHALLENGE FOR QUALITY
AND RELIABILITY IMPROVEMENT

Gerald J. Hahn
GE Global Research
(Retired)
ger...
THE OBVIOUS, THE EXPECTATION AND THE REALITY
• The Obvious
– Statistical quality and reliability analyses are based
upon s...
THE CONSEQUENCES AND THE
CHALLENGE
• The Consequences
– Why is it that every database that I have
encountered is filled wi...
WEBINAR TOPICS
•
•
•
•
•
•
•
•

Typical data acquisition situations
Problems (and opportunities) with observational data
A...
TYPICAL DATA ACQUISITION SITUATIONS
• Control over data acquisition
– Designed experiments
– Random sampling studies (from...
PROBLEMS (AND OPPORTUNITIES) WITH
OBSERVATIONAL DATA
•

Problems with “available” databases
– Data obtained for purposes o...
IN SUMMARY
• Even the most sophisticated statistical analysis cannot
compensate for or rescue inadequate data
• It’s not t...
DISCIPLINED, TARGETED PROCESS FOR DATA
ACQUISITION (DEUPM) FOR SYSTEMS
DEVELOPMENT STUDY
• Proposed process:
– Step 1: D: ...
STEP 1: DEFINE THE PROBLEM

• Define specific questions to be answered
Washing machine design example:
– Stated objective:...
STEP 2: EVALUATE THE EXISTING DATA
• Understand the process and its physical basis
Washing machine design example: Study u...
STEP 3:UNDERSTAND DATA ACQUISITION
OPPORTUNITIES AND LIMITATIONS
• Gain understanding of data that can be acquired and how...
STEP 4: PLAN DATA ACQUISITION AND IMPLEMENT

•

•

•

•

Specify test conditions or operational environment
Washing machin...
STEP 5: MONITOR, CLEAN DATA, ANALYZE AND
VALIDATE

• Monitor implementation to ensure that process is being followed
Washi...
SOME GUIDELINES FOR EFFECTIVE DATA
ACQUISITION (STEP 4)
RECORD KEY VARIABLES AND EVENTS
Example: Use field data to estimat...
ENSURE CONSISTENT AND
ACCURATE DATA RECORDING
• Strive for precise measurements
• Combat data recording inconsistencies
– ...
AVOID SYSTEMATICALLY
UNRECORDED OBSERVATIONS
Some examples:
• Information recorded on failed units
only
• Information only...
SOME OTHER HINTS
• Strive to obtain continuous data
• Aim for compatibility and integration of
databases
• Consider sampli...
CHALLENGES
• Some practical challenges
– Added cost and possible delays
– Added bureaucracy
– Diversity of data ownership:...
SOME RELEVANT FURTHER COMMENTARIES
•

Webinar adapted from
– Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics:...
ELEVATOR SPEECH
• We need put the horse (focus on data acquisition)
before the CART (Classification and Regression
Tree) d...
SOME RELEVANT FURTHER COMMENTARIES
•

Webinar adapted from
–
–

•

Hahn, G.J. and Doganaksoy, N. (2011), A Career in Stati...
Upcoming SlideShare
Loading in …5
×

Data Acquisition: A Key Challenge for Quality and Reliability Improvement

1,955 views

Published on

An ASQ Reliability Division webinar - learn more at http://reliabilitycalendar.org/webinars/

Published in: Technology, Business
  • Be the first to comment

Data Acquisition: A Key Challenge for Quality and Reliability Improvement

  1. 1. Data Acquisition: A Key Challenge for Quality and Reliability Improvement Gerald J. Hahn & Necip Doganaksoy ©2013 ASQ & Presentation Hahn & Doganaksoy http://reliabilitycalendar.org/webina rs/
  2. 2. ASQ Reliability Division English Webinar Series One of the monthly webinars on topics of interest to reliability engineers. To view recorded webinar (available to ASQ Reliability Division members only) visit asq.org/reliability To sign up for the free and available to anyone live webinars visit reliabilitycalendar.org and select English Webinars to find links to register for upcoming events http://reliabilitycalendar.org/webina rs/
  3. 3. DATA ACQUISITION: A KEY CHALLENGE FOR QUALITY AND RELIABILITY IMPROVEMENT Gerald J. Hahn GE Global Research (Retired) gerryhahn@yahoo.com Necip Doganaksoy GlobalFoundries necipdoganaksoy@yahoo.com ASQ RELIABILITY DIVISION WEBINAR November 14, 2013 3
  4. 4. THE OBVIOUS, THE EXPECTATION AND THE REALITY • The Obvious – Statistical quality and reliability analyses are based upon sample data (and assumptions about sampled populations, etc.) – Such analyses are only as good as the data upon which they are based – Bad data lead to more complex, less powerful or invalid analyses – David Moore: The most important information about any statistical study is how the data were produced • The Expectation: Much attention is given to the data acquisition process in training and applications • The Reality: Little or insufficient attention is generally given to the data acquisition process 4
  5. 5. THE CONSEQUENCES AND THE CHALLENGE • The Consequences – Why is it that every database that I have encountered is filled with data quality problems? (Theodore Johnson, 2003 QPRC) – Common wisdom puts the extent of the total project effort spent in cleaning the data before doing any analysis as high as 60-95% (DeVeaux and Hand, Statistical Science 2005) • The Challenge: Move data-acquisition to front burner – Understand limitations of available data – Emphasize data acquisition – Use disciplined process 5
  6. 6. WEBINAR TOPICS • • • • • • • • Typical data acquisition situations Problems (and opportunities) with observational data A disciplined, targeted approach for data acquisition Washing machine design reliability example Some guidelines for effective data acquisition Some practical challenges Some relevant further commentaries Elevator speech EMPHASIS ON QUALITY AND RELIABILITY 6
  7. 7. TYPICAL DATA ACQUISITION SITUATIONS • Control over data acquisition – Designed experiments – Random sampling studies (from specified population) – Double-blind medical studies – Systems development studies, e.g., • • • • • Estimate design reliability Evaluate measurement system Assess process capability Signal changes via control charts Anticipate/avoid field failures by automated monitoring • Observational studies (and data mining) on existing data often from Big Data MANY APPLICATIONS INVOLVE COMBINATIONS 7
  8. 8. PROBLEMS (AND OPPORTUNITIES) WITH OBSERVATIONAL DATA • Problems with “available” databases – Data obtained for purposes other than statistical analysis – Data resides in different data bases • Some limitations of observational data – – – – – • • Missing values and events Unrepresentative observations Inconsistent or imprecise measurements Limited variability Key impacting variables unrecorded; recorded proxy variables deemed “significant” (e.g., foot size impacts reading ability) Observational studies – May be helpful for prediction, e.g., credit performance, top selling items before expected hurricane, finding best time to buy plane ticket – Misleading or useless for gaining “cause and effect” understanding – Observation from the trenches (Kati Illouz, GE): Data owners tend to be overly optimistic about their data Data inadequacies (and reasons) define future information needs QUALITY—NOT QUANTITY—OF DATA IS WHAT COUNTS 8
  9. 9. IN SUMMARY • Even the most sophisticated statistical analysis cannot compensate for or rescue inadequate data • It’s not that there is lack of data. Instead, it is that the data are inadequate to answer the questions (NY Times article on “How Safe is Cycling?” October 22, 2013 • Massive data does not guarantee success…Knowing how the data were collected (the “data pedigree”) is critical (Snee, Union College Mathematics Conference, October 2013) • A good principle to remember is that data are guilty until proven innocent, not the other way around (Snee and Hoerl, QP Dec 2012) • Observational data have an important role in pointing the way forward, but they should not be a primary ingredient for making final decisions (Anderson-Cook and Borror, QP April 2013) 9
  10. 10. DISCIPLINED, TARGETED PROCESS FOR DATA ACQUISITION (DEUPM) FOR SYSTEMS DEVELOPMENT STUDY • Proposed process: – Step 1: D: Define the problem – Step 2: E: Evaluate the existing data – Step 3: U: Understand data acquisition opportunities and limitations – Step 4: P: Plan data acquisition and analysis – Step 5: M: Monitor, clean data, analyze and validate • Example: Demonstrate desired ten-year reliability for new washing machine design in 6 months elapsed time 10
  11. 11. STEP 1: DEFINE THE PROBLEM • Define specific questions to be answered Washing machine design example: – Stated objective: Show within 6 months and with 95% confidence that following can be met: • 95% reliability after one year of operation • 90% reliability after five years • 80% reliability after ten years (“reliability” defined as no repair or servicing need) – Added question: How can reliability be improved further? • Identify resulting actions Washing machine design example: Go to full scale production if validated and make identified improvements • State population or process of interest Washing machine design example: 6 million machines to be built in next 5 years 11
  12. 12. STEP 2: EVALUATE THE EXISTING DATA • Understand the process and its physical basis Washing machine design example: Study up and participate in design reviews, FMEA’s (Failure Mode and Effects Analyses), etc. • Determine and analyze existing data Washing machine design example – Previous design • Existing data – In-house component, sub-assembly and system tests – Field failure and servicing data • Conclusion: Previous design does not meet current reliability goals – New design • • • • • Proposed new design aims to correct key past problems Possible concern: Introduction of new failure modes Existing data: Component and sub-assembly test results Data identified one new failure mode; rapidly addressed and corrected Conclusion: Proposed new design appears to correct past problems without introducing new ones; reliability goals appear to be met • Identify data inadequacies Washing machine design example: No information about system performance in realistic use environment 12
  13. 13. STEP 3:UNDERSTAND DATA ACQUISITION OPPORTUNITIES AND LIMITATIONS • Gain understanding of data that can be acquired and how Washing machine example: In-house accelerated use rate systems testing • Simulate 3.5 years of operation per month • Evaluate weekly for failures • Sample unfailed units and measure degradation (destructive test) • Determine practical considerations and limitations in data acquisition Washing machine design example: • 6 months of testing • 3 prototype lots initially (and one more subsequently) • 36 available test stands • Assess relevance of resulting data to meet study goals and underlying assumptions Washing machine design example: • Assume prototype lots representative of 5-year high volume production • Assume failures are cycle (and not elapsed time) dependent • Assume realistic simulation of field environment Conclusion: This is analytic (not enumerative) study; statistical 13 confidence bounds capture only statistical uncertainty
  14. 14. STEP 4: PLAN DATA ACQUISITION AND IMPLEMENT • • • • Specify test conditions or operational environment Washing machine design example: Run washing machines with full load of soiled towels, mixed with sand, wrapped in plastic bag Specify sample size and selection process Washing machine design example: Select 12 units randomly from each of 3 prototype lots and put on life test Specify protocol and operational details Washing machine design example: – Record failures and determine failure mode – After 3 months and again after 6 months Years • Remove 4 units from each of 3 lots and measure degradation • Replace 3 month withdrawals with 12 units from 4th prototype lot – Assure high-precision measurements, meaningful failure definition, complete and consistent data recording procedures, etc Specify data analysis plan and assess expected statistical precision Washing machine design example: – Do Weibull distribution analysis on time to failure data after 6 months – Conduct supplementary analysis using degradation data – Simulation study demonstrated proposed plan provides desired statistical precision Specify pilot study Washing machine design example: Run three washing machines for one week Percent Failing • 14
  15. 15. STEP 5: MONITOR, CLEAN DATA, ANALYZE AND VALIDATE • Monitor implementation to ensure that process is being followed Washing machine design example: Continue involvement • Clean data—as gathered Washing machine design example: Develop proactive checks for missing or inconsistent data • Conduct preliminary analyses; act thereon, as appropriate Washing machine design example: Analyze failure data after 1 week, 1 month and 3 months; identify failure modes for correction • Conduct final data analysis and report findings Washing machine design example: Do final analyses after 6 months (failure and degradation data) • Validate: Propose appropriate validation testing Washing machine design example: – Continue 6 of 36 units on test beyond 6 months – Test 100 machines with company employees and 60 machines in laundromats – Audit sample 6 production units each week: Test five for 1 week; one for 3 months – Develop system for capturing and analyzing field reliability data – Provide current data access to engineers and management 15
  16. 16. SOME GUIDELINES FOR EFFECTIVE DATA ACQUISITION (STEP 4) RECORD KEY VARIABLES AND EVENTS Example: Use field data to estimate reliability and speedily identify/address root causes of failures calls for • Field data – – – – Estimate of product usage Product performance measurements over time Time to failure Failure mode information • Manufacturing data – – – – – – Parts and manufacturing lot identification Actual process conditions Ambient conditions during manufacture Unplanned events Other potentially important process variables End-of-line performance 16
  17. 17. ENSURE CONSISTENT AND ACCURATE DATA RECORDING • Strive for precise measurements • Combat data recording inconsistencies – Differences between operators – Differences in qualitative scaling assessments – Differences in data recording conventions; e.g. date of 2/8 • Address missing values – Understand reason – Handle appropriately – Minimize occurrence • Conduct timely data cleaning: Identify “errors” in recorded data (e.g. 999 for missing values) and correct 17
  18. 18. AVOID SYSTEMATICALLY UNRECORDED OBSERVATIONS Some examples: • Information recorded on failed units only • Information only during warranty period • Exclusion of “outlier” information • Purging of “old”—but still relevant-data 18
  19. 19. SOME OTHER HINTS • Strive to obtain continuous data • Aim for compatibility and integration of databases • Consider sampling 19
  20. 20. CHALLENGES • Some practical challenges – Added cost and possible delays – Added bureaucracy – Diversity of data ownership: Engineering, Manufacturing, etc. – Need for added work not evident Result: Lack of motivation by data recorders and their management • Strive to overcome by – – – – – Recognizing perspectives of others Understanding consequences of our requests Making requests as simple and reasonable as possible Automating data acquisition process Providing convincing justification (e.g., insurance) 20
  21. 21. SOME RELEVANT FURTHER COMMENTARIES • Webinar adapted from – Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the Numbers, Wiley (Chapter 11). – Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key Challenge, Quality Engineering, Vol. 24, #4, 446-459. • Also note – Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data Collection Strategies to Enhance Your Quality Analyses, Quality Progress, April, 1829. – Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12. – DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20 (3) 121-238. – Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the Challenge, Presentation at Joint Statistical Meetings. – Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and Industry, Wiley, 2008. – Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming). – Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18. – Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and origin of your data?, Quality Progress, December, 66-68. – Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press 21
  22. 22. ELEVATOR SPEECH • We need put the horse (focus on data acquisition) before the CART (Classification and Regression Tree) data analysis • Specific proposals – Focus on data acquisition in training programs – Scrutinize available data to assess relevance and identify gaps – Use disciplined, targeted process for added data acquisition – Remain constantly cognizant of underlying assumptions • Thanks for listening – Gerry Hahn, gerryhahn@yahoo.com – Necip Doganaksoy, necipdoganaksoy@yahoo.com 22
  23. 23. SOME RELEVANT FURTHER COMMENTARIES • Webinar adapted from – – • Hahn, G.J. and Doganaksoy, N. (2011), A Career in Statistics: Beyond the Numbers, Wiley (Chapter 11). Doganaksoy, N. and Hahn, G.J. (2012), Getting the Right Data Up Front: A Key Challenge, Quality Engineering, Vol. 24, #4, 446-459. Also note – – – – – – – – – Anderson-Cook, C.M. and Borror, C.M. (2013), Paving the Way: Seven Data Collection Strategies to Enhance Your Quality Analyses, Quality Progress, April, 18-29. Coleman, D.E., Montgomery, D.C. (1993), A Systematic Approach for Planning a Designed Industrial Experiment, Technometrics, Vol.35,.1, 1-12. DeVeaux, R. D., Hand, D.J, (2005), How to Lie with Bad Data, Statistical Science, 20 (3) 121-238. Hahn, G.J. , Doganaksoy, N. (2003), Data Acquisition: Focusing on the Challenge, Presentation at Joint Statistical Meetings. Hahn, G.J. Doganaksoy, N. (2008), The Role of Statistics in Business and Industry, Wiley, 2008. Kenett, R.S. and Shmueli, G. (2013), On Information Quality (with discussion and rejoinder), Journal of the Royal Statistical Society, Series A, (forthcoming). Schield, M (2006), Beware the Lurking Variable, Stats, 46, 14-18. Snee, R.D. , Hoerl, R.W. (2012), Inquiry on Pedigree: Do you know the quality and origin of your data?, Quality Progress, December, 66-68. Steiner, S.H, MacKay, R.J. (2005), Statistical Engineering: An Algorithm for Reducing Variation in Manufacturing Processes, Milwaukee, WI, ASQ Quality Press 23

×