Confidential – Do Not Distribute1
Spectrum of Data Analytics
- From Service to Product
Yubin Park, PhD
Managing Director, AI Solutions
Confidential – Do Not Distribute2
About Myself
PhD in Machine Learning
2009 - 2014
Various research projects with:
Co-founder, CEO/CTO
2014 - 2017
A healthcare machine learning start-up for multiple Medicare Advantage plans and their vendors
Acquired by Evolent Health, Inc. in 2017
Managing Director, AI Solutions
2017 - Present
Exploring and implementing AI-driven healthcare applications for value-based care
Confidential – Do Not Distribute3
What is Data Analytics?
Confidential – Do Not Distribute4
New insights from data?
Automating tasks at scale?
Automating extracting new insights
from data at scale?
Let’s find out with some examples…
Confidential – Do Not Distribute5
A. Reda, Y. Park, M. Tiwari, C. Posse, S. Shah. Metaphor: A System for Related Search Recommendations, CIKM 2012
Case I: Building a related-search engine
Analyze
Web Logs
Build
Models
Evaluate
Models
(backtest)
Deploy
Models
Test Models
(AB testing)
New Insights from Data
Automating Tasks at Scale
Automating extracting
new insights from data
at scale and deploying
them
Confidential – Do Not Distribute6
Analyze
claims and
EHRs
Build
models
Evaluate
models
Deploy
model
outputs in
operations*
Get
feedback
from
doctors*
RASQ Team @ Evolent Health, Inc.
Case II: Finding missing conditions
New Insights from Data
Automating Tasks at Scale
Semi-automating extracting
new insights from data at scale
and deploying them
*These items are not fully automated
Confidential – Do Not Distribute7
Analyze Star
Ratings
Build Models for
Predicting Star
Rating Cut-points
Visualize the
results to discuss
business
implications
RASQ Team @ Evolent Health, Inc.
Case III: Forecasting Star Rating Cut-points
New Insights from Data
Automation is not feasible
for two reasons:
- Every year, some rule
changes happen
- Frequency of run is
once per year; not often
enough
Confidential – Do Not Distribute8
Spectrum of Data Analytics
Frequency of Run
Level of
Manual
Supervision
Analytics Product
Analytics Service
Hybrid of
Service and
Product
Direction for Scalability
- From Service to Product
- Role of data analytics
- AUTOMATE EVERYTHING!
Confidential – Do Not Distribute9
https://xkcd.com/1319/
Confidential – Do Not Distribute10
• Data Analytics teams are inherently cross-departmental
• A vertically integrated team from service to product
Data Analytics Ideal Mix for Best ROI
Analytics
Leader
Business &
Operation
Data Scientists
Software
Engineers
Confidential – Do Not Distribute11
Velocity of
Improvement
Velocity of
Implementation
Velocity of Run-
time
Yubin Park. Triangle of Velocity Tradeoffs 2017
Automation: Triangle of Velocity Tradeoffs
• When it comes to “automation”, you are
essentially managing a software development
project
• In my opinion, of many qualities, “speed”
matters the most
• However, you cannot achieve all three of the
velocity measures: velocity of improvement, run-
time, and implementation
o e.g. If you are developing “fast”, your code may not
be easy to modify for future improvements
• Depending on project scales and other business
needs, you would need to balance these three
velocity axes
Confidential – Do Not Distribute12
• Data Analytics = Data + Analytics
• Decompose your data analytics process into:
o Data
o Statistical Models
o Business Rules
• Statistical Models and Business Rules can be re-
used for different datasets
• API-tize Statistical Models and Business Rules to
share with other teams (or clients)
• This decomposition may take some time in the
beginning, but will eventually speed up your
future projects dramatically
Reusability: API-tizing Your Data Analytics
Data
Analytics
Data
Statistical
Models
Business
Rules
Confidential – Do Not Distribute13
• Data Analytics is a process of finding new insights from data and making the process itself
automated
• The spectrum of data analytics ranges from simple service models to software products
• The best ROI can be achieved by streamlining this Data Analytics process by
o Having a good mix of business folks, data scientists, and software engineers
o Appropriately planning the automation process taking into account of the triangle of velocity tradeoffs
o Make any analytical processes re-usable by API-tizing Statistical Models and Business Rules
Conclusions
Confidential – Do Not Distribute14
ypark@evolenthealth.com

Big data & analytics forum (yubin evh)

  • 1.
    Confidential – DoNot Distribute1 Spectrum of Data Analytics - From Service to Product Yubin Park, PhD Managing Director, AI Solutions
  • 2.
    Confidential – DoNot Distribute2 About Myself PhD in Machine Learning 2009 - 2014 Various research projects with: Co-founder, CEO/CTO 2014 - 2017 A healthcare machine learning start-up for multiple Medicare Advantage plans and their vendors Acquired by Evolent Health, Inc. in 2017 Managing Director, AI Solutions 2017 - Present Exploring and implementing AI-driven healthcare applications for value-based care
  • 3.
    Confidential – DoNot Distribute3 What is Data Analytics?
  • 4.
    Confidential – DoNot Distribute4 New insights from data? Automating tasks at scale? Automating extracting new insights from data at scale? Let’s find out with some examples…
  • 5.
    Confidential – DoNot Distribute5 A. Reda, Y. Park, M. Tiwari, C. Posse, S. Shah. Metaphor: A System for Related Search Recommendations, CIKM 2012 Case I: Building a related-search engine Analyze Web Logs Build Models Evaluate Models (backtest) Deploy Models Test Models (AB testing) New Insights from Data Automating Tasks at Scale Automating extracting new insights from data at scale and deploying them
  • 6.
    Confidential – DoNot Distribute6 Analyze claims and EHRs Build models Evaluate models Deploy model outputs in operations* Get feedback from doctors* RASQ Team @ Evolent Health, Inc. Case II: Finding missing conditions New Insights from Data Automating Tasks at Scale Semi-automating extracting new insights from data at scale and deploying them *These items are not fully automated
  • 7.
    Confidential – DoNot Distribute7 Analyze Star Ratings Build Models for Predicting Star Rating Cut-points Visualize the results to discuss business implications RASQ Team @ Evolent Health, Inc. Case III: Forecasting Star Rating Cut-points New Insights from Data Automation is not feasible for two reasons: - Every year, some rule changes happen - Frequency of run is once per year; not often enough
  • 8.
    Confidential – DoNot Distribute8 Spectrum of Data Analytics Frequency of Run Level of Manual Supervision Analytics Product Analytics Service Hybrid of Service and Product Direction for Scalability - From Service to Product - Role of data analytics - AUTOMATE EVERYTHING!
  • 9.
    Confidential – DoNot Distribute9 https://xkcd.com/1319/
  • 10.
    Confidential – DoNot Distribute10 • Data Analytics teams are inherently cross-departmental • A vertically integrated team from service to product Data Analytics Ideal Mix for Best ROI Analytics Leader Business & Operation Data Scientists Software Engineers
  • 11.
    Confidential – DoNot Distribute11 Velocity of Improvement Velocity of Implementation Velocity of Run- time Yubin Park. Triangle of Velocity Tradeoffs 2017 Automation: Triangle of Velocity Tradeoffs • When it comes to “automation”, you are essentially managing a software development project • In my opinion, of many qualities, “speed” matters the most • However, you cannot achieve all three of the velocity measures: velocity of improvement, run- time, and implementation o e.g. If you are developing “fast”, your code may not be easy to modify for future improvements • Depending on project scales and other business needs, you would need to balance these three velocity axes
  • 12.
    Confidential – DoNot Distribute12 • Data Analytics = Data + Analytics • Decompose your data analytics process into: o Data o Statistical Models o Business Rules • Statistical Models and Business Rules can be re- used for different datasets • API-tize Statistical Models and Business Rules to share with other teams (or clients) • This decomposition may take some time in the beginning, but will eventually speed up your future projects dramatically Reusability: API-tizing Your Data Analytics Data Analytics Data Statistical Models Business Rules
  • 13.
    Confidential – DoNot Distribute13 • Data Analytics is a process of finding new insights from data and making the process itself automated • The spectrum of data analytics ranges from simple service models to software products • The best ROI can be achieved by streamlining this Data Analytics process by o Having a good mix of business folks, data scientists, and software engineers o Appropriately planning the automation process taking into account of the triangle of velocity tradeoffs o Make any analytical processes re-usable by API-tizing Statistical Models and Business Rules Conclusions
  • 14.
    Confidential – DoNot Distribute14 ypark@evolenthealth.com