Disruptive Data                                            Science                                               Annika Ji...
Agenda CareCore National’s Evolution Data-Driven Transformations EMC’s Assist© Copyright 2012 EMC Corporation. All righ...
CareCore’s Role in Health Care                        • Optimized Compensation                        • Legal Actions     ...
Pathway Traversal Prediction                                                                    Patient                   ...
Predictive Modeling In the Call Center                                                          Actual traversal          ...
CareCore National Analytics Phases       Customer = CCN                                      Customer = External          ...
Big Data        Key Characteristics                               Implications for the Enterprise    • Large Volumes      ...
Big Data Requires New Approaches to Analytics Data Science & Big Data Analytics                               Predictive A...
Industries Are Broadly Embracing Data Science                             Retail                          Advertising & Pu...
Data-Driven                                      Transformations© Copyright 2012 EMC Corporation. All rights reserved.   10
Transformation Catalysts                                                                                            Proces...
Transformation Catalysts                                                                                            Proces...
Big Data Analytics Is Now    Central to Enterprise Strategy•   Unauth. User Access    Detection•   Web Server Attack    De...
Transformation Catalysts                                                                                            Proces...
Big Data Analytics Is Differentfrom Traditional BI                                                         “TRADITIONAL BI...
Data Science Team asYour New Source of Innovation© Copyright 2012 EMC Corporation. All rights reserved.   16
Traditional Analytics Process                                                         Time-to-InsightsData Prep         sa...
Analytics with GreenplumMarketing optimization with MADlib                                                         > SELEC...
People & SkillsThree Key Roles of the New Data Ecosystem                                                Role              ...
Profile of a Data Scientist                                                              Quantitative                     ...
Specific Data Science Skills & Traits    1    2    3                    EDW    4    5                   Apply data science...
The Greenplum Data Science Team Senior Director, Data & Insights Services,              Data Mining in Healthcare  Yahoo...
Biggest Obstacles toData Science Adoption      Q25 : The biggest obstacle to Data Science adoption in our organization is:...
The Question of Org                   A                                                        B               Central    ...
The Question of Org               A                                                                                   B   ...
Process & Change Management                                                         Translate Vision                      ...
New Roles Required Beyond CoreDS Team                                                         + Analytics Executive       ...
EMC’s Assist© Copyright 2012 EMC Corporation. All rights reserved.        28
Greenplum Unified Analytic Platform                                                          Agile Big Data Analytics    ...
GREENPLUM ANALYTICS LABSAnalytics Solutions: Goals                                                    Overcome the analyt...
GREENPLUM ANALYTICS LABSAnalytics Lab: Packages     LAB PRIMER                             LAB 100                LAB 600 ...
Skills Matrix, Based on Recent Students                                    Quantitative Analysts,                         ...
Data Science and Big Data Analytics                   Course and EMCDSA Certification                   Course Overview   ...
Summary –Transformation Success Criteria Establish a clear vision for the role of  Big Data Analytics Understand end-to-...
Q&A© Copyright 2012 EMC Corporation. All rights reserved.         35
Other Relevant Greenplum SessionsSession                                                  Presenter          TimesUnified ...
Thank You© Copyright 2012 EMC Corporation. All rights reserved.        37
Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )
Upcoming SlideShare
Loading in …5
×

Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People ( EMC World 2012 )

2,169 views

Published on

An examination of the trends of Big Data and Advanced Analytics as well as the technology, services and education needed to thrive in this new field. This session explores examples of true industry-disruptive analytics-driven transformations and the catalysts for transformation. Examining the role of people is paramount to success in order to develop a high-performing data scientist team - starting today.

Published in: Technology, Business
1 Comment
2 Likes
Statistics
Notes
No Downloads
Views
Total views
2,169
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
85
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People ( EMC World 2012 )

  1. 1. Disruptive Data Science Annika Jimenez David Dietrich© Copyright 2012 EMC Corporation. All rights reserved. 1
  2. 2. Agenda CareCore National’s Evolution Data-Driven Transformations EMC’s Assist© Copyright 2012 EMC Corporation. All rights reserved. 2
  3. 3. CareCore’s Role in Health Care • Optimized Compensation • Legal Actions • Accurate Payments Insurance Companies (Carriers) • Providers’ Performance • Optimized Price • Fraud Detection • Legal Action • Reduced Liability • Customized Plans • Claims • Sources of Variability • Better Services • Diagnosis • Industry Statistics • Treatments • Procedures • Employees Analysis • Legal Action Clinics, • Cost • Industry’s Statistics • Optimal Site Hospitals, • Patient’s Profiles • Potential of Savings Redirection Plan HC • Improved Benefits Customers Centers, and Value • Cost-Effective Providers • Better Services Treatment • Knowledge Dissemination • Reduced Liability • Industry’s Statistics • Clinic’s Performance • Legal Action • Sources of Variability • Providers’ Performance • Improved Employee’s • Industries Statistics Satisfaction • Providers’ Performance • Better prices and options Patients • Better Selection of Providers (Members) • Better Selection of Clinics • Improved Expenses Planning© Copyright 2012 EMC Corporation. All rights reserved. 3
  4. 4. Pathway Traversal Prediction Patient Provider Patient CPT Pathway Traversals Claims Profile Plan History Prior Provider Pathway ProfileTraversals Predicted Actual Plan Details© Copyright 2012 EMC Corporation. All rights reserved. 4
  5. 5. Predictive Modeling In the Call Center Actual traversal Traversal anomaly: Triage the call 95% CI Predicted traversal Patient profile Plan details Provider profile© Copyright 2012 EMC Corporation. All rights reserved. 5
  6. 6. CareCore National Analytics Phases Customer = CCN Customer = External 3.0 AaaS @ Scale 1.5 CCN Analytics • Data Exchange (BI + Predictive) • Workflow Management 2.0 Analytics as a ServiceTime • New Customer Orientation • New Solutions Development 1.0 CCN Analytics • Data Source = Carrier+ • 2.0 Batch 2.5 Data Exchange • Greenplum Platform • Data Source = CCN+ • BI-Centric • Operationally Oriented • Customer is CCN Data Value © Copyright 2012 EMC Corporation. All rights reserved. 6
  7. 7. Big Data Key Characteristics Implications for the Enterprise • Large Volumes • New Platforms • New Sources • New Roles • Low Latencies • New Techniques© Copyright 2012 EMC Corporation. All rights reserved. 7
  8. 8. Big Data Requires New Approaches to Analytics Data Science & Big Data Analytics Predictive Analytics & Data Mining (Data Science) Typical • Optimization, predictive modeling, Techniques forecasting, statistical analysis & Data • Structured/unstructured data, many Types types of sources, very large data sets High Common • What if…..? Questions • What’s the optimal scenario for our business ? • What will happen next? What if these trends continue? Why is this happening? Data Science Business IntelligenceBUSINESS Typical • Standard and ad hoc reporting, Techniques & dashboards, alerts, queries, details VALUE Data Types on demand Business • Structured data, traditional sources, manageable data sets Intelligence Common • What happened last quarter? Questions • How many did we sell? • Where is the problem? In which situations? Low Past TIME Future © Copyright 2012 EMC Corporation. All rights reserved. 8
  9. 9. Industries Are Broadly Embracing Data Science Retail Advertising & Public Relations •CRM – Customer Scoring •Demand Signaling •Store Siting and Layout •Ad Targeting •Fraud Detection / Prevention •Sentiment Analysis •Supply Chain Optimization •Customer Acquisition Financial Services Media & Telecommunications •Algorithmic Trading •Network Optimization •Risk Analysis •Customer Scoring •Fraud Detection •Churn Prevention •Portfolio Analysis •Fraud Prevention Manufacturing Energy •Product Research •Smart Grid •Engineering Analytics •Exploration •Process & Quality Analysis •Distribution Optimization Government Healthcare & Life Sciences •Market Governance •Pharmaco-Genomics •Counter-Terrorism •Bio-Informatics •Econometrics •Pharmaceutical Research •Health Informatics •Clinical Outcomes Research© Copyright 2012 EMC Corporation. All rights reserved. 9
  10. 10. Data-Driven Transformations© Copyright 2012 EMC Corporation. All rights reserved. 10
  11. 11. Transformation Catalysts Process & Change Mgt Org Alignment DS Skill Availability Analytics Tools & Platform Distributed Computing Data Availability Time© Copyright 2012 EMC Corporation. All rights reserved. 11
  12. 12. Transformation Catalysts Process & Change Mgt Org Alignment DS Skill Availability Analytics Tools & Platform Distributed Computing Data Availability Time© Copyright 2012 EMC Corporation. All rights reserved. 12
  13. 13. Big Data Analytics Is Now Central to Enterprise Strategy• Unauth. User Access Detection• Web Server Attack Detection• Malware Protection• Advanced Persistent Threat Detection • KPI Definition• Online Behavioral • Risk Modeling Analyses: • Compliance• Other product-sourced data & behavioral CSO analytics o Mobile devices CFO o DVRs • IT Log Analytics o Smart meters CPO • Error/event Logs CIO o Other electronics Analytics Big Data• Vertically Specific • Network Analytics Product Development o Genetic mining o Imaging CMO Analytics COO o Oil/gas exploration• Unified User Profiling• Segmentation• Churn Prediction • Fraud Detection• Lifecycle Management • Error Log Analysis• Purchase Funnel Analysis • Complaint Data Analysis• Brand Analytics • Call Center Data Analysis• Campaign Analytics • Demand o Media Mix Modeling Planning/Forecasting CEO o SEM Optimization • Quality/ Reliability o Behavioral/Social Analysis Targeting • Fault/Service Failure o Attribution (Detection/Prediction) o ROI Optimization o Social Effects• Pricing• Demand Forecasting © Copyright 2012 EMC Corporation. All rights reserved. 13
  14. 14. Transformation Catalysts Process & Change Mgt Org Alignment DS Skill Availability Analytics Tools & Platform Distributed Computing Data Availability Time© Copyright 2012 EMC Corporation. All rights reserved. 14
  15. 15. Big Data Analytics Is Differentfrom Traditional BI “TRADITIONAL BI” Repetitive “BIG DATA ANALYTICS” Experimental, Ad Hoc Structured Mostly Semi-Structured Operational External + Operational GBs to 10s of TBs 10s of TB to Pb’s© Copyright 2012 EMC Corporation. All rights reserved. 15
  16. 16. Data Science Team asYour New Source of Innovation© Copyright 2012 EMC Corporation. All rights reserved. 16
  17. 17. Traditional Analytics Process Time-to-InsightsData Prep sample.csv spec.docx DB Extract scores.csv DB Import Not a Scalable Process!© Copyright 2012 EMC Corporation. All rights reserved. 17
  18. 18. Analytics with GreenplumMarketing optimization with MADlib > SELECT householdID, variables FROM households ORDER BY RANDOM() Time-to-Insights LIMIT 100000; > SELECT run_univariate_analysis ( households_training, variables); WHERE pvalue<.01 AND r2>.01; > SELECT run_regression( Data Prep MADlib univariate_results, households_training); > SELECT householdID, madlib.array_dot( coef::REAL[], xmatrix::REAL[]) FROM coefficients, households;© Copyright 2012 EMC Corporation. All rights reserved. 18
  19. 19. People & SkillsThree Key Roles of the New Data Ecosystem Role Data Scientists Deep Analytical Talent Projected U.S. talent gap: 140,000 to 190,000 Projected U.S. talent Data Savvy Professionals gap: 1.5 million Technology & Data Enablers Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation, competition, and productivity© Copyright 2012 EMC Corporation. All rights reserved. 19
  20. 20. Profile of a Data Scientist Quantitative Curious & Technical Creative Skeptical Communicative & Collaborative© Copyright 2012 EMC Corporation. All rights reserved. 20
  21. 21. Specific Data Science Skills & Traits 1 2 3 EDW 4 5 Apply data science methods in their current roles© Copyright 2012 EMC Corporation. All rights reserved. 21
  22. 22. The Greenplum Data Science Team Senior Director, Data & Insights Services,  Data Mining in Healthcare Yahoo! (MIA, UCSD) (Ph.D. and Postdoctoral Fellow, Australian National University) Principle Scientist at RSA, Fraud Detection, Speech and Language • Biomedical Informatics (Ph.D., Stanford) Processing • Research Engineer at Fox Interactive (M.S. in Signal Processing) Media, eHarmony (M.S. in Applied Marketing optimization Mathematics) (Ph.D. in Operations Research) • Quantitative modeling and risk management in trading and finance Stochastic machine learning (Ph.D. in Economics, Princeton, M.S. in (Ph.D., Australian National University) Mathematics, Courant Institute) Director of Analytics at M-Factor, • Mechanical Engineering (Ph.D., Stanford) DemandTec (M.S., Berkeley) • Statistician, Bayesian Analysis (M.S., Statistics, Stanford)© Copyright 2012 EMC Corporation. All rights reserved. 22
  23. 23. Biggest Obstacles toData Science Adoption Q25 : The biggest obstacle to Data Science adoption in our organization is: (Coded for Total)© Copyright 2012 EMC Corporation. All rights reserved. 23
  24. 24. The Question of Org A B Central Decentralized Model Model• Data Science sits within IT • Data Science sits with LOB• Close proximity to data platform • Distanced from data platform• Cross-initiative scalability • Limited scalability across org• Establishment of best practices • Lack of best practices• Jump-start ―data driven‖ culture • Slower ―data driven‖ culture• Centrally defined prioritization • LOB defined prioritization• Lack of domain expertise • Strong domain expertise © Copyright 2012 EMC Corporation. All rights reserved. 24
  25. 25. The Question of Org A B Central Hybrid Decentralized Model Model Model • Data Science in IT with physical LOB placement • Close proximity to data platform • Cross-initiative scalability • Establishment of best practices • Jump-start ―data driven‖ culture • LOB driven prioritization • Strong domain expertise© Copyright 2012 EMC Corporation. All rights reserved. 25
  26. 26. Process & Change Management Translate Vision into Concrete Projects Explore Project the Workspaces with Prioritize Deliver Wins! Data Stakeholders Collaboration Publish Socialize Roadmap and Data and Progress Resource Analysis Iterate Solidify Integration Points with Dependencies© Copyright 2012 EMC Corporation. All rights reserved. 26
  27. 27. New Roles Required Beyond CoreDS Team + Analytics Executive – ―Chief Analytics Officer‖? + Engagement Managers© Copyright 2012 EMC Corporation. All rights reserved. 27
  28. 28. EMC’s Assist© Copyright 2012 EMC Corporation. All rights reserved. 28
  29. 29. Greenplum Unified Analytic Platform  Agile Big Data Analytics  Rich Capabilities for Complex Analytics  Extends Leading Tools  Fully Customizable for Analytics  Increases Analytical Productivity & Results  Extends Insight By Combining All Data  Augmented by the Greenplum Data Science Team  Developed, Packaged, Supported by EMC© Copyright 2012 EMC Corporation. All rights reserved. 29
  30. 30. GREENPLUM ANALYTICS LABSAnalytics Solutions: Goals  Overcome the analytics gap  Generate continuous insights by developing re-usable models on massive data sets  Produce actionable, ready-to-deploy models  Build collaborative relationships among data stakeholders  Educate users on the development of tools and best practices  Establish a strategic vision for on-going analytics development© Copyright 2012 EMC Corporation. All rights reserved. 30
  31. 31. GREENPLUM ANALYTICS LABSAnalytics Lab: Packages LAB PRIMER LAB 100 LAB 600 LAB 1200 (1-Day Workshop) (Analytics Bundle) (6-Week Lab) (12-WEEK LAB) • Analytics Roadmap • On-site MPP • Analytics • Analytics Analytics Roadmap Roadmap • Prioritized Training Opportunities • Prof. services • Prof. services • Analytics tool-kit on GPDB* on GPDB* • Architectural Recommendations • Quick insight • Ready-to-deploy • Ready-to-deploy (2 weeks) model(s) model(s) *GPDB priced separately© Copyright 2012 EMC Corporation. All rights reserved. 31
  32. 32. Skills Matrix, Based on Recent Students Quantitative Analysts, Statisticians, Data Business and data Scientists analystsQuantitative Skills Business Recent STEM Intelligence Grads Professionals, IT Technical Ability © Copyright 2012 EMC Corporation. All rights reserved. 32
  33. 33. Data Science and Big Data Analytics Course and EMCDSA Certification Course Overview Details • ―Open‖ curriculum • Practitioner’s approach • Enables immediate participation on analytics projects • Prepares for EMC Proven Professional Data Science Associate (EMCDSA) Certification© Copyright 2012 EMC Corporation. All rights reserved. 33
  34. 34. Summary –Transformation Success Criteria Establish a clear vision for the role of Big Data Analytics Understand end-to-end platform dependencies Embrace the UAP paradigm Educate & build your Data Science Dream Team Organize to your contextual reality Initiate smart process Deliver one concrete ―win‖ Socialize, socialize, socialize© Copyright 2012 EMC Corporation. All rights reserved. 34
  35. 35. Q&A© Copyright 2012 EMC Corporation. All rights reserved. 35
  36. 36. Other Relevant Greenplum SessionsSession Presenter TimesUnified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00Greenplum Database Overview Michael Crutcher Mon 8:30-9:30 Wed 10:00-11:00Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30Optimizing Greenplum Database on VMware Kevin O’Leary Mon 4:00-5:00 Tues 4:15-5:15Virtualized InfrastructureBig Data Driven Businesses in Action: Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30Creating Real Business Value UsingGreenplum UAP (Panel w/4 Customers)Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45Disruptive Data Science — How Data Annika Jimenez Tues 4:15-5:15 Thurs 11:30-12:30Science and Big Data are Transforming David DietrichBusiness, IT and People© Copyright 2012 EMC Corporation. All rights reserved. 36
  37. 37. Thank You© Copyright 2012 EMC Corporation. All rights reserved. 37

×