© 2016 Health Catalyst
Proprietary and Confidential
Machine Learning
Misconceptions
May 3rd, 2017
© 2016 Health Catalyst
Proprietary and Confidential
Data Science Team
2
Levi Thatcher, PhD
Director of Data Science
Mike Mastanduno, PhD
Data Scientist
Taylor Miller, PharmD
Data Scientist
Taylor Larsen, MS
Data Science Engineer
© 2016 Health Catalyst
Proprietary and Confidential
Purpose of Today’s Chat
3
• Compare and contrast machine learning and artificial intelligence.
• Discuss techniques that offer feedback into the system and when it’s
necessary to retrain a model.
• Give advice on how to avoid common pitfalls in machine learning
implementation.
• Talk about potential applications of the different classes of machine
learning techniques.
• Q&A
© 2016 Health Catalyst
Proprietary and Confidential
Machine Learning Definition
4
Machine learning is the subfield of computer science that gives computers the ability to learn
without being explicitly programmed. Such algorithms overcome following strictly static program
instructions by making data-driven predictions or decisions through building a model from sample
inputs.
- Wikipedia
© 2016 Health Catalyst
Proprietary and Confidential
Machine Learning Typical Use
5
• Movie recommendations on Netflix
• People you may know on Facebook
• Advertising
• Patient likelihood of contracting sepsis, being readmitted…
• Using any tabular data source to predict a Y/N or continuous outcome
© 2016 Health Catalyst
Proprietary and Confidential
Artificial Intelligence Definition
6
Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, the field of AI
research defines itself as the study of "intelligent agents": any device that perceives its environment
and takes actions that maximize its chance of success at some goal.
- Wikipedia
These models are limited in their ability to “reason”, i.e. to carry out long chains of inferences, or
optimization procedure to arrive at an answer. The number of steps in a computation is limited by
the number of layers in feed-forward nets, and by the length of time a recurrent net will remember
things.
- Yann LeCun, Director of Facebook AI Research
© 2016 Health Catalyst
Proprietary and Confidential
Artificial Intelligence Typical Use
7
• Speech translation
• Complex game playing
• Self-driving cars
• Content delivery
• Radiology?
© 2016 Health Catalyst
Proprietary and Confidential
Difference Between ML and AI
8
• It’s fuzzy
• Learning from data? No, not really.
• Continuous learning from data? No, not really.
• AI feels more complicated.
• AI should be able to learn a skill and generalize it to
another entirely different thing.
• Many AI ideas get rebranded as ML as time goes on and
we understand them.
© 2016 Health Catalyst
Proprietary and Confidential
Poll #1: Have you ever used machine learning or AI?
148 respondents
9
• Yes, in my daily work – 21%
• Yes, as a hobby – 17%
• No, but I plan to – 52%
• No, not applicable – 9%
© 2016 Health Catalyst
Proprietary and Confidential
How is machine learning used?
10
© 2016 Health Catalyst
Proprietary and Confidential
Poll #2: Where is your organization in terms of using
machine learning in regular operations?
138 respondents
11
• Using machine learning tools daily across many departments and use
cases – 13%
• Daily across a couple of use case – 17%
• Confined to a research study or two – 49%
• What is machine learning? – 21%
© 2016 Health Catalyst
Proprietary and Confidential
When does a model learn?
12
• Different algorithms learn at different times
• Only during training
• Logistic regression
• Random forest
• Clustering
• Periodically after new data comes in
• Any of the above (but more complex implementation)
• Naïve Bayes
• Neural networks
• Deep learning
• Continuously as new data comes in
• Any of the above (but still more complex implementation)
© 2016 Health Catalyst
Proprietary and Confidential
When should a model be retrained?
13
• After significant data turnover
• If performance in production drops over time
• Seasonality
• Changing treatment methods
• If new features or techniques are identified
• If the use case changes
© 2016 Health Catalyst
Proprietary and Confidential
Pitfall 1: Poorly Defined Use Case
• Leads to:
• Incorrect usage of data fields
• Unavailable data
• No adoption
• Use case is always the
first priority
• What is the question?
• Who are the users?
• When are they using it?
• How are they using?
14
© 2016 Health Catalyst
Proprietary and Confidential
Pitfall 2: Production Environment is Different
15
• Data might not be
available
• Timing of data might lead
to target leakage
• Predictions are made
multiple times per patient
• Learn how your data is
populated over time
• Only train with what’s
available at the time of
prediction
• Know your use case!
© 2016 Health Catalyst
Proprietary and Confidential
Pitfall 3: Bad Performance Metrics
16
• 99% accurate, but didn’t
find any sick people
• Imbalanced classes
• Performance changing
over time
• AUC or Precision-Recall
• Sampling methods
during model training
• Monitor correct
performance metric over
time
• Know your use case!
© 2016 Health Catalyst
Proprietary and Confidential
Pitfall 4: Poor Adoption
17
• Do people know about
it?
• Is it answering a relevant
question?
• Is visualization done
well?
• Do people trust the
model?
• Tell people about it
• Know the use case
• Simple is better,
shouldn’t affect workflow
• Improve trust with
prediction explanations
or transparent models
© 2016 Health Catalyst
Proprietary and Confidential
Poll #3: What’s impeding you from moving forward
with machine learning in your organization?
116 respondents
18
• Available tools are overwhelming OR don’t know what exists – 16%
• Use cases are overwhelming OR don’t know what’s possible – 28%
• Don’t have or can’t afford the technical staff to implement – 23%
• Adoption—clinical team isn’t interested – 9%
• Other – 25%
© 2016 Health Catalyst
Proprietary and Confidential
Potential Applications: ML and EMR
19
• Clinical
• Risk scores – readmissions, mortality
• Risk adjusted comparisons
• Replacing clinical rulesets
• Correct coding
• Operational
• Staff need forecasting
• Length of stay prediction
• Financial
• Propensity to pay
• Predicted procedure cost
© 2016 Health Catalyst
Proprietary and Confidential
Potential Applications: NLP or Smarter Analytics
20
• Parsing clinical notes
• Fill in discrete text fields automatically
• Find new features that only come up in
conversation
• Smart retrospective analysis
• Trend analysis
• Exploration across the whole EMR
• Serve up insights automatically
© 2016 Health Catalyst
Proprietary and Confidential
Potential Applications: Image Processing
21
• Diagnostics of pre-segmented suspicious
regions
• Automatic segmentation of tissue types
• Diagnosis of or staging of screening images
• Diagnosis or staging of pathology slides
© 2016 Health Catalyst
Proprietary and Confidential
Poll #4: What’s the most valuable use for ML/AI/Big
Data to your organization?
95 respondents
22
• Parsing free-form clinical notes – 14%
• Image interpretation – 5%
• Clinical risk scores – 47%
• Operational efficiency – 29%
• These are buzz words and not worth the time. – 4%
© 2016 Health Catalyst
Proprietary and Confidential
Poll #5: If there was an algorithm that was FDA
approved and read mammographic images on par
with a radiologist, would you use it?
90 respondents
23
• Yes, I’d trust it completely – 16%
• Yes, but only as an aide to the radiologist – 81%
• No, I wouldn’t trust it – 3%
© 2016 Health Catalyst
Proprietary and Confidential
Before we end…
24
© 2016 Health Catalyst
Proprietary and Confidential
Questions?
25

Machine Learning Misconceptions

  • 1.
    © 2016 HealthCatalyst Proprietary and Confidential Machine Learning Misconceptions May 3rd, 2017
  • 2.
    © 2016 HealthCatalyst Proprietary and Confidential Data Science Team 2 Levi Thatcher, PhD Director of Data Science Mike Mastanduno, PhD Data Scientist Taylor Miller, PharmD Data Scientist Taylor Larsen, MS Data Science Engineer
  • 3.
    © 2016 HealthCatalyst Proprietary and Confidential Purpose of Today’s Chat 3 • Compare and contrast machine learning and artificial intelligence. • Discuss techniques that offer feedback into the system and when it’s necessary to retrain a model. • Give advice on how to avoid common pitfalls in machine learning implementation. • Talk about potential applications of the different classes of machine learning techniques. • Q&A
  • 4.
    © 2016 HealthCatalyst Proprietary and Confidential Machine Learning Definition 4 Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed. Such algorithms overcome following strictly static program instructions by making data-driven predictions or decisions through building a model from sample inputs. - Wikipedia
  • 5.
    © 2016 HealthCatalyst Proprietary and Confidential Machine Learning Typical Use 5 • Movie recommendations on Netflix • People you may know on Facebook • Advertising • Patient likelihood of contracting sepsis, being readmitted… • Using any tabular data source to predict a Y/N or continuous outcome
  • 6.
    © 2016 HealthCatalyst Proprietary and Confidential Artificial Intelligence Definition 6 Artificial intelligence (AI) is intelligence exhibited by machines. In computer science, the field of AI research defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal. - Wikipedia These models are limited in their ability to “reason”, i.e. to carry out long chains of inferences, or optimization procedure to arrive at an answer. The number of steps in a computation is limited by the number of layers in feed-forward nets, and by the length of time a recurrent net will remember things. - Yann LeCun, Director of Facebook AI Research
  • 7.
    © 2016 HealthCatalyst Proprietary and Confidential Artificial Intelligence Typical Use 7 • Speech translation • Complex game playing • Self-driving cars • Content delivery • Radiology?
  • 8.
    © 2016 HealthCatalyst Proprietary and Confidential Difference Between ML and AI 8 • It’s fuzzy • Learning from data? No, not really. • Continuous learning from data? No, not really. • AI feels more complicated. • AI should be able to learn a skill and generalize it to another entirely different thing. • Many AI ideas get rebranded as ML as time goes on and we understand them.
  • 9.
    © 2016 HealthCatalyst Proprietary and Confidential Poll #1: Have you ever used machine learning or AI? 148 respondents 9 • Yes, in my daily work – 21% • Yes, as a hobby – 17% • No, but I plan to – 52% • No, not applicable – 9%
  • 10.
    © 2016 HealthCatalyst Proprietary and Confidential How is machine learning used? 10
  • 11.
    © 2016 HealthCatalyst Proprietary and Confidential Poll #2: Where is your organization in terms of using machine learning in regular operations? 138 respondents 11 • Using machine learning tools daily across many departments and use cases – 13% • Daily across a couple of use case – 17% • Confined to a research study or two – 49% • What is machine learning? – 21%
  • 12.
    © 2016 HealthCatalyst Proprietary and Confidential When does a model learn? 12 • Different algorithms learn at different times • Only during training • Logistic regression • Random forest • Clustering • Periodically after new data comes in • Any of the above (but more complex implementation) • Naïve Bayes • Neural networks • Deep learning • Continuously as new data comes in • Any of the above (but still more complex implementation)
  • 13.
    © 2016 HealthCatalyst Proprietary and Confidential When should a model be retrained? 13 • After significant data turnover • If performance in production drops over time • Seasonality • Changing treatment methods • If new features or techniques are identified • If the use case changes
  • 14.
    © 2016 HealthCatalyst Proprietary and Confidential Pitfall 1: Poorly Defined Use Case • Leads to: • Incorrect usage of data fields • Unavailable data • No adoption • Use case is always the first priority • What is the question? • Who are the users? • When are they using it? • How are they using? 14
  • 15.
    © 2016 HealthCatalyst Proprietary and Confidential Pitfall 2: Production Environment is Different 15 • Data might not be available • Timing of data might lead to target leakage • Predictions are made multiple times per patient • Learn how your data is populated over time • Only train with what’s available at the time of prediction • Know your use case!
  • 16.
    © 2016 HealthCatalyst Proprietary and Confidential Pitfall 3: Bad Performance Metrics 16 • 99% accurate, but didn’t find any sick people • Imbalanced classes • Performance changing over time • AUC or Precision-Recall • Sampling methods during model training • Monitor correct performance metric over time • Know your use case!
  • 17.
    © 2016 HealthCatalyst Proprietary and Confidential Pitfall 4: Poor Adoption 17 • Do people know about it? • Is it answering a relevant question? • Is visualization done well? • Do people trust the model? • Tell people about it • Know the use case • Simple is better, shouldn’t affect workflow • Improve trust with prediction explanations or transparent models
  • 18.
    © 2016 HealthCatalyst Proprietary and Confidential Poll #3: What’s impeding you from moving forward with machine learning in your organization? 116 respondents 18 • Available tools are overwhelming OR don’t know what exists – 16% • Use cases are overwhelming OR don’t know what’s possible – 28% • Don’t have or can’t afford the technical staff to implement – 23% • Adoption—clinical team isn’t interested – 9% • Other – 25%
  • 19.
    © 2016 HealthCatalyst Proprietary and Confidential Potential Applications: ML and EMR 19 • Clinical • Risk scores – readmissions, mortality • Risk adjusted comparisons • Replacing clinical rulesets • Correct coding • Operational • Staff need forecasting • Length of stay prediction • Financial • Propensity to pay • Predicted procedure cost
  • 20.
    © 2016 HealthCatalyst Proprietary and Confidential Potential Applications: NLP or Smarter Analytics 20 • Parsing clinical notes • Fill in discrete text fields automatically • Find new features that only come up in conversation • Smart retrospective analysis • Trend analysis • Exploration across the whole EMR • Serve up insights automatically
  • 21.
    © 2016 HealthCatalyst Proprietary and Confidential Potential Applications: Image Processing 21 • Diagnostics of pre-segmented suspicious regions • Automatic segmentation of tissue types • Diagnosis of or staging of screening images • Diagnosis or staging of pathology slides
  • 22.
    © 2016 HealthCatalyst Proprietary and Confidential Poll #4: What’s the most valuable use for ML/AI/Big Data to your organization? 95 respondents 22 • Parsing free-form clinical notes – 14% • Image interpretation – 5% • Clinical risk scores – 47% • Operational efficiency – 29% • These are buzz words and not worth the time. – 4%
  • 23.
    © 2016 HealthCatalyst Proprietary and Confidential Poll #5: If there was an algorithm that was FDA approved and read mammographic images on par with a radiologist, would you use it? 90 respondents 23 • Yes, I’d trust it completely – 16% • Yes, but only as an aide to the radiologist – 81% • No, I wouldn’t trust it – 3%
  • 24.
    © 2016 HealthCatalyst Proprietary and Confidential Before we end… 24
  • 25.
    © 2016 HealthCatalyst Proprietary and Confidential Questions? 25

Editor's Notes

  • #3 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #4 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #5 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #6 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #7 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #8 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #9 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #10 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #11 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #12 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #13 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #14 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #15 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #16 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #17 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #18 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #19 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #20 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #21 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #22 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #23 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #24 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #25 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified
  • #26 Additional outcome goals to select from: Reduce financial risk through improved ability to: Identify and analyze variation from targets Understand populations (conditions, risks, aims) Use trends to predict future needs Additional process aim statements to select from: Increase number of sites/providers/initiatives who receive reports of their performance Increase the number of high-service utilization patients and diagnostic groups identified (can serve as targets for improvement work) Increase the number of major drivers of utilization (hospital, ED, rehab) identified