Health & Status Monitoring (2010-v8)


Published on

This is a variant of a talk that I gave at Predictive Analytics World in February 2010.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Health & Status Monitoring (2010-v8)

  1. 1. Health & Status Monitoring: Two Case Studies<br />Robert Grossman Open Data Group<br />February 18, 2010<br />1<br />
  2. 2. 1. Introduction<br />2<br />
  3. 3. Traditional Approach<br />Two types of variation:<br />Common cause of variation (noise) occur as normal part of manufacturing process<br />Special cause of variation represents a potential problem<br />3<br />
  4. 4. 3 s<br />Shewhart control chart used by NIST for calibrating the standard KG.<br />4<br />Source: NIST<br />
  5. 5. Shewhart / Deming Cycle<br />Plan – identify opportunity or problem and make a plan.<br />Do – implement the change on a small scale and collect the data.<br />Check – perform a statistical analysis and check if there was an impact.<br />Act – if there was an impact, broaden the scale and continuously improve your results.<br />5<br />
  6. 6. Case Study 1. Data Center<br />Thousands of servers<br />Complex workloads<br />Large variations are normal<br />Problems make the front page<br />6<br />
  7. 7. Case Study 2. Payments Network<br />Billion+ cards<br />100+ million terminals<br />Millions of merchants<br />Thousands of transactions per second<br />Thousands of member banks<br />Data highly heterogeneous<br />Variations among products<br />Variations among cardholders<br />Variations among merchants<br />Variations among banks<br />Variation among payment networks<br />7<br />
  8. 8. The Challenge Today<br />Many sources and data feeds <br />Data is complex and highly heterogeneous<br />High volume, streaming data from around the world<br />Multiple parties involved, each of which can modify the data in subtle ways<br />8<br />
  9. 9. Health & Status Monitoring Systems<br />9<br />
  10. 10. 2. The Technology<br />10<br />
  11. 11. 11<br />Observed Model<br />Baseline Model<br />CUSUM models<br />GLR models<br />
  12. 12. Build more than 104 Models: One for Each Cell in Cube of Models<br />15,000+ separate baselines <br /><ul><li>Build separate model for each bank (1000+)
  13. 13. Build separate model for each geographical region (6 regions)
  14. 14. Build separate model for each different type of merchant (over 800 types of merchants)
  15. 15. For each distinct cube, build a distinct model</li></ul>Geospatialregion<br />Type of Transaction<br />Bank<br />Modeling using Cubes of Models (MCM)<br />12<br />
  16. 16. learning sets<br />data updates<br />1. data <br />collection<br />3. on-line scoring<br />2. off-line modeling<br />Entity/Feature Database<br />Data Mining <br />System<br />PMML<br />models<br />features<br />Model Consumer<br />Data Mining <br />Mart<br />events<br />Rules<br />candidate alerts<br />Operational systems, data feeds, warehouses, …<br />4. reporting<br />Dashboard engine<br />reports<br />13<br />
  17. 17. Augustus<br />Augustus is an open source data mining platform:<br />Used to estimate baselines for over 15,000 separate segmented models<br />Used to score high volume operational data and issue alerts for follow up investigations <br /> Augustus is PMML compliant<br /> Augustus scales with<br />Volume of data<br />Real time transaction streams (15,000/sec+)<br />Number of segmented models (10,000+) <br />14<br />
  18. 18. Greedy Meaningful/Manageable Balancing (GMMB) Algorithm<br />Breakpoint<br /><ul><li> More alerts
  19. 19. Alerts more meaningful
  20. 20. To increase alerts, add breakpoint to split cubes,order by number of new alerts, & select one or more new breakpoints
  21. 21. Fewer alerts
  22. 22. Alerts more manageable
  23. 23. To decrease alerts, remove breakpoint,order by number of decreased alerts, & select one or more breakpoints to remove</li></ul>One model for each cell in data cube<br />15<br />
  24. 24. 3. Case Studies<br />16<br />
  25. 25. Case Study 1<br />Open Cloud Testbed Monitor<br />17<br />
  26. 26. Results<br />Dozens of separate statistical baselines models developed and deployed.<br />Effective for discovering nodes that are hindering effective use of OCC’s large data cloud.<br />Dead nodes are easy to identify and remove.<br />Removing just one or two “slow” nodes from a pool of 100 nodes can improve overall performance by 15% - 20+%.<br />18<br />
  27. 27. Dashboard<br />19<br />
  28. 28. Case Study 2<br />Account<br />Issuing Bank<br />Payments Network<br />Merchant<br />Acquiring Bank<br />20<br />
  29. 29. Program Structure<br />Strategic objective identified early: <br />“Identify and ameliorate data interoperability issues to improve the approval rate of valid transactions and the disapproval rate of invalid transactions, ...”<br />Report quarterly to CIOs’ council with third-party endorsed monetary benefits summarized on an executive dash board<br />Introduced data governance program early in project<br />Developed payment transaction monitor that produced candidate alerts<br />Set up investigation process to screen alerts and investigate those of interest<br />Developed reference models and appropriate standards<br />21<br />
  30. 30. Results<br />ROI<br />5.1x Year 1 (over 6 months)<br />7.3x Year 2 (12 months)<br />10.0x Year 3 (12 months)<br />Over 15,500 separate statistical baselines models developed and deployed.<br />Also developed appropriate rules-based models to make work of analysts more efficient.<br />22<br />
  31. 31. 4. Summary<br />23<br />
  32. 32. Business Process<br />Strategic Objective<br />Dashboard<br />Governance<br />Modeling Process<br />Reference Model<br />Investigative Process<br />Monitor - produces candidate alerts<br />InvestigativeProcess<br />candidate alerts<br />events<br />program alerts<br />24<br />
  33. 33. Some Lesson Learned<br />Business Processes<br />Importance of “C”-level executive support, dashboard reports, and a data governance program<br />Modeling Processes<br />Critical to build as many statistical models as the data required; used open source Augustus software for this<br />Architecture separated offline modeling and online scoring<br />Post processing with business rules to control workflow to analysts<br />Investigative Processes<br />It is not about the models and alerts – it is about optimizing the analysts’ workload and derived business value<br />Small changes in report designs had large impact in the effectiveness of the alerts<br />25<br />
  34. 34. Summary<br />26<br />
  35. 35. For More Information<br />Learn about Health and status monitoring<br /><ul><li>Open Data Group
  36. 36.</li></ul>Robert Grossman<br /> at<br /> (blog)<br />27<br />
  37. 37. References<br />Joseph Bugajski, Chris Curry, Robert L. Grossman, David Locke, Steve Vejcik, Detecting Changes in Large Data Sets of Payment Card Data: A Case Study, Proceedings of The Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2007), ACM, 2007<br />Joseph Bugajski and Robert L. Grossman, An Alert Management Approach to Data Quality: Lessons Learned from the Visa Data Authority Program, Proceedings of the 12th International Conference on Information Quality, (ICIQ 2007).<br />Walter A. Shewhart, Statistical Method from the Viewpoint of Quality Control, Dover, 1986.<br />H. Vincent Poor and Olympia Hadjiliadis, Quickest Detection, Cambridge University Press, 2009.<br />Augustus is an open source system available from<br />28<br />