Data Mining and Analytics


Published on

Published in: Business, Technology

Data Mining and Analytics

  1. 1. <ul><li>Robert M. Shapiro </li></ul><ul><li>Senior Vice President </li></ul><ul><li>Global 360, Inc. </li></ul><ul><li>SessionTitle: </li></ul><ul><li>Analytics and Data Mining </li></ul>Welcome to Transformation and Innovation 2007 The Business Transformation Conference
  2. 2. Agenda <ul><li>Analytics </li></ul><ul><li>Simulation </li></ul><ul><li>Optimization </li></ul><ul><li>Data Mining </li></ul><ul><li>Summary </li></ul>
  3. 3. Why Care About Analytics and Data Mining? <ul><ul><li>When Workflow Management Systems first began to proliferate (1990s) there was little attention paid to the data generated by the running processes. Most thought this as an audit trail, not a source of information for process improvement. </li></ul></ul><ul><ul><li>We now understand that the historical record contains valuable information essential to a well orchestrated continuous process improvement program. </li></ul></ul><ul><ul><li>Correctly designed analytics is the starting point for providing business process intelligence. The analytics drives both real-time monitoring and predictive optimization of the executing B usiness P rocess M anagement S ystem. </li></ul></ul>
  4. 4. Overview Business Operations Control Event Detection &Correlation Predictive Simulation Data Mining Optimization Event Bus ERP BPM ECM Legacy EAI Custom Historical Analytics Real Time Dashboards Alerts & Actions
  5. 5. Process Model Example
  6. 6. Events WORKFLOWCREATE WORKFLOWTERMINATE CHILDCREATE CHILDTERMINATE ARRIVEACTIVITY BEGINACTIVITY COMPLETEACTIVITY SUSPENDACTIVITY CANCELACTIVITY CONTINUEACTIVITY BEGINTIMEDSEQUENCE COMPLETETIMEDSEQUENCE LOGGEDEVENT Activity Process Timed Sequence Logged Event Event Event Group <?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?> <XPDLogEvents> <XPDLogEvent System=&quot;MortgageDemo&quot; Scenario=&quot;Mortgage Lending AsIs&quot; Run=&quot;8/28/2006 6:29:01 PM&quot; InstanceId=&quot;6505&quot; ParentInstanceId=&quot;&quot; WorkflowInstanceId=&quot;6505&quot; Timestamp=&quot;2006-08-01T07:00:00Z&quot; SequenceId=&quot;281010&quot; ProcessId=&quot;Mortgage Lending AsIs&quot; ProcessVersion=&quot;1&quot; EventType=&quot;WORKFLOWCREATE&quot; ActivitySetId=&quot;1&quot; ActivityId=&quot;15&quot; QueueId=&quot;-1&quot; ElapsedTimeDays=&quot;0&quot; ElapsedBusinessHours=&quot;0&quot; ElapsedBusinessDays=&quot;0&quot; AccruedWaitDays=&quot;0&quot; AccruedWaitBusinessDays=&quot;0&quot; AccruedWaitBusinessHours=&quot;0&quot; AccruedProcessingDays=&quot;0&quot; AccruedProcessingBusinessDays=&quot;0&quot; AccruedProcessingBusinessHours=&quot;0&quot;> <Participants> <Participant ParticipantId=&quot;System&quot; /> </Participants> <DataFields> <DataField Name=&quot;SIM_Cost&quot; Type=&quot;FLOAT&quot;>0</DataField> </DataFields> </XPDLogEvent> </XPDLogEvents>
  7. 7. Timing Data from Workflow Log Events
  8. 8. Analytics Architecture Publish AE Relational Database Events OLAP and DataMining Databases Process Analysis Engine Queries Context Data Client Reports Participants, UDFs, XPDL Staging and Event Queue Tables Fact and Dimension Tables Process Engine Administration Controls Analysis Engine Exposes UDFs Triggers Cube Processing Monitors DBs Web Service Business Operations Historical Analytics
  9. 9. Process Analytics <ul><li>Features </li></ul><ul><ul><li>Fast analysis of process, activity & SLA statistics, quality and labor information </li></ul></ul><ul><ul><li>Drill down / slice and dice – Explore data from different perspectives </li></ul></ul><ul><li>Benefits </li></ul><ul><ul><li>Business process intelligence </li></ul></ul><ul><ul><li>Identify process improvement areas </li></ul></ul><ul><ul><li>End to end process visibility </li></ul></ul><ul><li>Problem </li></ul><ul><ul><li>You have to know where to look in the hypercube. </li></ul></ul>
  10. 10. BAM Dashboards <ul><ul><li>Status indicators </li></ul></ul><ul><ul><li>Queue Counts </li></ul></ul><ul><ul><li>Counters </li></ul></ul><ul><ul><li>Goal/KPI status and trends </li></ul></ul>
  11. 11. Actions & Alerts Process Metrics Action Schedule Rules Engine Email and Cellphone notification Process Event Triggers Goals Thresholds Risk Mitigation KPI Evaluation Web Service Call or Execute Script Actions
  12. 12. Simulation <ul><li>Why would you want to build simulation models? </li></ul><ul><ul><li>A simulation model lets you do what-ifs </li></ul></ul><ul><ul><ul><li>What if I changed my staff schedules </li></ul></ul></ul><ul><ul><ul><li>What if I bought a faster check sorter </li></ul></ul></ul><ul><ul><ul><li>What if the number of applications increased dramatically because of a marketing campaign </li></ul></ul></ul><ul><ul><li>The simulation results predict the effect on critical KPIs such as end-to-end cycle time and cost per processed application. </li></ul></ul><ul><ul><li>Hence simulation plays an important role in continuous process improvement. </li></ul></ul>
  13. 13. Key Simulation Factors <ul><li>Options </li></ul><ul><ul><li>Time Frame, Animation Update Frequency, Exposed Fields </li></ul></ul><ul><li>Activities </li></ul><ul><ul><li>Duration, Performers, Decisions </li></ul></ul><ul><ul><li>Pre and Post Assignments, Pre and PostScripts </li></ul></ul><ul><ul><li>Use of Historical Data for decisions and durations </li></ul></ul><ul><li>Arrivals </li></ul><ul><ul><li>Process, Start Activity </li></ul></ul><ul><ul><li>Batch, Field Values, Pattern and Repeat </li></ul></ul><ul><ul><li>Use of Historical Data for work load distributions over time </li></ul></ul><ul><li>Data Fields </li></ul><ul><li>Participants </li></ul><ul><ul><li>Schedules, Roles Played, Details </li></ul></ul><ul><li>Roles </li></ul><ul><li>Schedule Definitions </li></ul>
  14. 14. Review of Analytics, BAM and Simulation <ul><ul><li>A stream of events produced by a variety of business process engines (ERP, Supply Chain Management, BPMS enactment) is fed to an Analytics engine which transforms the event data into usable information. </li></ul></ul><ul><ul><li>A Business Activity Monitoring module updates in real time a set of KPI indicators and using a Rules Engine applied to these indicators, generates Alerts and Actions which inform managers of critical situations and alter the behavior of the running processes. </li></ul></ul><ul><ul><li>A simulation tool, using the historical data, provides What-If analysis in support of continuous process improvement. Integrated with a Work Force Management system it enables optimization of staff schedules. </li></ul></ul><ul><ul><ul><li>But designing the what-if scenarios can be a challenging and labor-intensive task for a specialist. </li></ul></ul></ul>
  15. 15. Automatic Optimization <ul><ul><li>Automatic Optimization uses Analytics and Simulation to generate and evaluate proposals for achieving a set of goals. </li></ul></ul><ul><ul><li>Analysis of Process structure in conjunction with historical data about processing delays and resource availability permits the intelligent exploration of improvement strategies. </li></ul></ul><ul><ul><li>Coupled with WorkForce Management technology, this approach helps optimize staff schedules. </li></ul></ul>
  16. 16. Optimization BottleNeck Analysis Determine task most understaffed Cross-train most idle, feasible person Alternatively hire new one Predict (simulate altered scenario) Wait Time Reduction by Load Balancing Analyze current situation Predict (simulate) Alter scenario Propose measure for improvement
  17. 17. Process Model Example
  18. 18. Throughput: Unresolved Work Objects <ul><li>Work objects are piling up as long as workitems arrive </li></ul><ul><li>(Cycle times go up continually) </li></ul><ul><li>One region in particular is understaffed </li></ul><ul><li>Number of unresolved work objects is limited </li></ul><ul><li>(Upper bound for cycle times) </li></ul>After Load Balancing Before Load Balancing
  19. 19. Resource Utilization (Idle Times) <ul><li>Quite unbalanced </li></ul><ul><li>More balanced </li></ul>Before Load Balancing After Load Balancing
  20. 20. Judging the Effect: Productivity Index Initially: After Load Balancing:
  21. 21. Judging the Effect: Throughput Analysis After role addition (e.g. cross training): After resource addition: Initially:
  22. 22. Review of Automated Optimization Technology <ul><ul><li>Optimization, using goals formulated as KPI’s, can analyze historical information and propose what changes are likely to help attain these goals. It can systematically evaluate the proposed changes, using the simulation tool as a component. </li></ul></ul><ul><ul><li>This can be performed in a totally automated manner, with termination upon satisfying the goal or recognizing that no proposed change results in further improvement. </li></ul></ul><ul><ul><li>Staff optimization, focusing on end-to-end cycle time and processing cost as the KPI’s, is one example of the application of this technology. </li></ul></ul>
  23. 23. Data Mining <ul><ul><li>There are three stages: </li></ul></ul>Data New Data Data Mining Apply To Predict Explore Data Build Mining Model Deploy Patterns
  24. 24. Process versus Content Data Mining <ul><li>We focus on the data generated by typical computer-based business processes, using Process Intelligence as the lens through which to view the data. </li></ul><ul><li>This process view is critical in developing a mining structure and mining models that expose correlations between Key Performance Indicators and other factors such as work item attributes, resource schedules, arrival patterns and other external business factors. </li></ul>
  25. 25. Building a Mining Structure <ul><li>A Mining Structure is built by selecting tables and views from a relational database or OLAP cube and specifying the input and predictable columns (variables/attributes) in the selected tables/views. </li></ul><ul><li>Choosing the input columns requires an analysis of the data. Here is an example based on the Microsoft Data Mining software. </li></ul>
  26. 26. Choosing Columns from a Subset of the Data Source View
  27. 27. Process Model Example (previous example)
  28. 28. Mining Structure
  29. 29. A Tornado has knocked out a Processing Center
  30. 30. Can Data Mining Technology Help Us? <ul><li>A tornado has knocked out the application processing center in Tulsa. </li></ul><ul><li>The event stream provides the staff change info to the BAM component. </li></ul><ul><li>How can we use a trained data mining model to rapidly determine how the temporary loss of staff will impact the end-to-end processing time for applications? </li></ul><ul><li>What will the cost be for processing a loan application under these circumstances? </li></ul>
  31. 31. Predicting Cycle Time We can tell customers the expected wait time for a loan.
  32. 32. Predicting Cost
  33. 33. Can Data Mining Technology Help Us? <ul><li>A marketing campaign is expected to increase the number of low end loan applications next month. </li></ul><ul><li>Simulation-based forecasting could be used to optimize work force management, but the simulation model must have accurate information about how long each step in the process takes and using average duration values based on history will not do. </li></ul><ul><li>How can data mining provide better estimates for durations based on line-of-business attributes of the applications? </li></ul>
  34. 34. Discovering Duration Rule
  35. 35. Branching
  36. 36. Cluster Analysis
  37. 37. Making Predictions using Simulation and Data Mining <ul><li>Simulation and Data Mining can both be used to make predictions. Are they competing or complementary technologies? </li></ul><ul><li>We have already discussed the role of Data Mining in the preparation of information required for accurate simulations. </li></ul><ul><li>Apart from this, there are major differences. </li></ul><ul><ul><li>The simulation model must be a sufficiently accurate representation of the collection of processes being executed. It can make predictions for situations not previously encountered so long as the underlying processes have not changed. </li></ul></ul><ul><ul><li>The Data Mining predictions are based on a statistical analysis of what has already happened. A trained mining model assumes the historical patterns are still valid. </li></ul></ul><ul><li>There are major differences in performance. </li></ul><ul><ul><li>Simulation is computationally intensive. It takes significant time to obtain predictions. </li></ul></ul><ul><ul><li>In Data Mining, the training is computationally intensive. Once a model is trained predictions are extremely fast. Periodic retraining may be required to keep the model accurate. </li></ul></ul>
  38. 38. Summary <ul><li>BPMS generate event streams that provide the Analytics Data needed for Business Activity Monitoring in real time and Continuous Process Improvement . </li></ul><ul><li>A customizable Optimizer , employing Data Mining and Simulation tool kits, derives from the Analytics Data a stream of recommendations for improving the business operations, including: </li></ul><ul><ul><li>Redeployment of resources </li></ul></ul><ul><ul><li>Process changes </li></ul></ul><ul><ul><li>Optimization of business rules </li></ul></ul><ul><li>The Data Mining component supports an alternative approach to prediction under changing business circumstances and generates critical information for use by the Simulator. It also provides Process Discovery capabilities useful in Process Re-Design. </li></ul>
  39. 39. Thank You <ul><li>Robert M. Shapiro </li></ul><ul><li>Senior Vice President </li></ul><ul><li>Global 360, Inc. </li></ul><ul><li>Contact Information: </li></ul><ul><li>617-823-1055 </li></ul><ul><li>[email_address] </li></ul>
  40. 40. References <ul><li>StatSoft, Inc. (2006). Electronic Statistics Textbook. Tulsa, OK: StatSoft. WEB: </li></ul><ul><li>Microsoft Inc. (2006) SQL Server 2005 Books Online </li></ul><ul><li>Wiley, Inc. (2005) Data Mining with SQL Server 2005, Tang & MacLennan </li></ul><ul><li>Sams Publishing (2006) Microsoft SQL Server 2005 Integration Services, Haseldon </li></ul><ul><li>Wiley, Inc. (2004) Data Mining Techniques: For Marketing, Sales and CRM, Berry </li></ul><ul><li>Idea Group Publishing (2001) Data Mining and Business Intelligence: A Guide to Productivity, Kudyba </li></ul>