1
Perfect Partnership – Machine Learning & CDISC
Kevin Lee
Director of Data Science
2
The views and opinions expressed in the following PowerPoint
slides are those of the individual presenter and should not be
attributed to Drug Information Association, Inc. (“DIA”), its
directors, officers, employees, volunteers, members, chapters,
councils, Communities or affiliates, or any organization with
which the presenter is employed or affiliated.
These PowerPoint slides are the intellectual property of the
individual presenter and are protected under the copyright laws
of the United States of America and other countries. Used by
permission. All rights reserved. Drug Information Association,
Drug Information Association Inc., DIA and DIA logo are
registered trademarks. All other trademarks are the property of
their respective owners.
Disclaimer – Content Slide
3
Q1 : World’s Largest Transportation Company?
In January 2018, there is about
7,000,000 “drivers”.
The uber is operating in 600
cities in 78 countries.
There has been 5 billion rides.
Uber – about 85% of US
hailing market.
4
Q2 : World’s Largest Accommodation Provider?
There are more than 4.5 million Airbnb listings in 81,000 cities.
Airbnb hosts have earned $41 billion in 10 years.
As of June’17, Airbnb’s total valuation is about $31 billion.
5
Common Characteristics of Exponential
Organization
Data
Algorithm (Machine Learning/AI)
Exponential &
Scalable Growth
6
How Data and Algorithms help the exponential
growth
More Data
Better
Algorithms
(ML/AI)
Better
Products
More
Users
7
What is Machine Learning?
An application of artificial
intelligence (AI) that
provides systems the
ability to automatically
learn and improve from
experience without being
explicitly programmed.
8
Explicit Programming vs Machine Learning
Explicit Programming Machine Learning
9
How does Human Learn? - Experience
10
How does Machine Learn?
Algorithm
Test Data (cats)
Real Data
cat
11
More data,
better model
X
Y
X
Y Different data,
different model
12
Data Quality in Machine Learning
Garbage in Garbage out
13
The economist – The world’s most valuable
resource is no longer oil, but DATA.
14
Can Pharmaceutical be the exponential organization?
Pharm =
exponential organization
????
15
Gold mines - Clinical trial data in
Pharmaceutical industries
Clean – Pharma companies spent a
lot of hours to clean the data.
Unbiased - Prospective study,
randomized
Blinded – double-blinded
Structured
Metadata
Standards – CDISC
Already pharmaceutical companies
already owned the data.
Data in Pharmaceutical industry
16
Main purpose - for the submission
No or limited analysis after submission
Do not know exactly where clinical trial data is
Limited/No access
No CDR (Central Data Repository)
What is the reality of clinical trial data
17
Clinical Trial Data with ML/AI for Patients/Doctor
Machine Learning/ AI
CDISC
Clinical
Trial
Data
Patients/
Doctors/
Healthcare
Providers/
Drug
Companies
18
CDISC with Machine Learning
Algorithm
CDISC
(Test
Data)
Real
World
Data
Prediction
19
CDISC + Machine
Learning
More than drug
Drugs &
Experience
Conclusion
What Apple gives us?
20
Kevin Lee
Director of Data Science
kevin.kyosun.lee@gmail.com
Twitter @HelloKevinLee
LinkedIn https://www.linkedin.com/in/HelloKevinLee/
Join the conversation #DIA2018
Thank You

Perfect partnership - machine learning and CDISC standard data

  • 1.
    1 Perfect Partnership –Machine Learning & CDISC Kevin Lee Director of Data Science
  • 2.
    2 The views andopinions expressed in the following PowerPoint slides are those of the individual presenter and should not be attributed to Drug Information Association, Inc. (“DIA”), its directors, officers, employees, volunteers, members, chapters, councils, Communities or affiliates, or any organization with which the presenter is employed or affiliated. These PowerPoint slides are the intellectual property of the individual presenter and are protected under the copyright laws of the United States of America and other countries. Used by permission. All rights reserved. Drug Information Association, Drug Information Association Inc., DIA and DIA logo are registered trademarks. All other trademarks are the property of their respective owners. Disclaimer – Content Slide
  • 3.
    3 Q1 : World’sLargest Transportation Company? In January 2018, there is about 7,000,000 “drivers”. The uber is operating in 600 cities in 78 countries. There has been 5 billion rides. Uber – about 85% of US hailing market.
  • 4.
    4 Q2 : World’sLargest Accommodation Provider? There are more than 4.5 million Airbnb listings in 81,000 cities. Airbnb hosts have earned $41 billion in 10 years. As of June’17, Airbnb’s total valuation is about $31 billion.
  • 5.
    5 Common Characteristics ofExponential Organization Data Algorithm (Machine Learning/AI) Exponential & Scalable Growth
  • 6.
    6 How Data andAlgorithms help the exponential growth More Data Better Algorithms (ML/AI) Better Products More Users
  • 7.
    7 What is MachineLearning? An application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.
  • 8.
    8 Explicit Programming vsMachine Learning Explicit Programming Machine Learning
  • 9.
    9 How does HumanLearn? - Experience
  • 10.
    10 How does MachineLearn? Algorithm Test Data (cats) Real Data cat
  • 11.
    11 More data, better model X Y X YDifferent data, different model
  • 12.
    12 Data Quality inMachine Learning Garbage in Garbage out
  • 13.
    13 The economist –The world’s most valuable resource is no longer oil, but DATA.
  • 14.
    14 Can Pharmaceutical bethe exponential organization? Pharm = exponential organization ????
  • 15.
    15 Gold mines -Clinical trial data in Pharmaceutical industries Clean – Pharma companies spent a lot of hours to clean the data. Unbiased - Prospective study, randomized Blinded – double-blinded Structured Metadata Standards – CDISC Already pharmaceutical companies already owned the data. Data in Pharmaceutical industry
  • 16.
    16 Main purpose -for the submission No or limited analysis after submission Do not know exactly where clinical trial data is Limited/No access No CDR (Central Data Repository) What is the reality of clinical trial data
  • 17.
    17 Clinical Trial Datawith ML/AI for Patients/Doctor Machine Learning/ AI CDISC Clinical Trial Data Patients/ Doctors/ Healthcare Providers/ Drug Companies
  • 18.
    18 CDISC with MachineLearning Algorithm CDISC (Test Data) Real World Data Prediction
  • 19.
    19 CDISC + Machine Learning Morethan drug Drugs & Experience Conclusion What Apple gives us?
  • 20.
    20 Kevin Lee Director ofData Science kevin.kyosun.lee@gmail.com Twitter @HelloKevinLee LinkedIn https://www.linkedin.com/in/HelloKevinLee/ Join the conversation #DIA2018 Thank You