Alexander Gedranovich
Chief Technology Officer
poder.IO
linkedin.com/in/alexander-gedranovich-73847435
Predicting medical
tests results using
Driverless AI
Outline
1. poder.IO Introduction
2. H2O at poder.IO
3. Cases for Driverless AI
4. Predicting medical tests results
poder.IO Introduction
Our main product is cloud platform EPICA.
EPICA uses AI to predict what your audience is going to do and when
with a high degree of accuracy.
You can use these predictions to have granular understanding of
customer journeys and personalize user’s experience at individual
level across web, email, social and digital advertising.
H2O at poder.IO
We update and deploy as API 100+ models daily (POJO / MOJO)
• Regression / Classification (GBM, GLM, RandomForest)
• Text Classification (Word2Vec +)
• Time Series Patterns (iSAX)
• Deep Networks (DeepWater + Tensorflow)
• Etc.
Cases for Driverless AI
At the moment:
1. Driverless AI as a benchmark for all models before production
2. Research Department for handle clients’ cases
Planning to use in production Q3 2018:
1. Advertising Campaigns Optimization
2. Content Classification
3. Profiles Matching
4. Look-a-like models
Predicting medical tests results
Disclaimer
The research was supported by Bayer AG.
The project was completed by the joint team of Data Scientists from
RocketScience.ai and Analytics from Bayer.
Currently RocketScience.ai team is a part of poder.IO.
Predicting medical tests results: Problem
The research goal is to develop an approach to predict individual
medical test results based on longitudinal medical and pharma claims
data without direct lab measures using data-driven analytic
techniques.
Such discoveries may result in improved treatment strategies.
TODO: // Substitute to graphics
Predicting medical tests results: Problem
• Medical laboratory test, which is required for making a decision on a
patient’s treatment strategy
• The test results are not available in most healthcare databases
• There is a need to predict the results of the test for any patient at any
point of time
Predicting medical tests results: Design
Predicting medical tests results: Design
Predicting medical tests results: Data
• 10 years time interval
• 11 M records
• 4 M unique patients
• Training data: 80%
• Test data: 20%
• Number of raw features: ~260
Predicting medical tests results: Prerequisites
Models / methods:
• ETL (C++, R, ggplot2)
• H2O.ai based GLM, GBM, Random Forest
• H2O.ai Driverless AI
Hardware:
• ETL, H2O models: 128Gb / 1Tb / 32 cores
• Driverless AI: AWS g3.8xlarge
Predicting medical tests results: Outcome
Model Training time RMSE R2 MAE Top features
GLM (ElasticNet) 00:13:20 16.477 0.5540 13.3785 100% original
GBM 100% original
Random Forest 100% original
Ensemble (3
models)
-
Ensemble (9
models)
-
Driverless AI 00:55:15 15.913 0.5857 12.8999 46% original
TODO: // Fill details
Predicting medical tests results: top 10 features
TODO: // Insert table with top 10 feature intersection from different
models
Predicting medical tests results: Surrogate model
Predicting medical tests results: Partial dependence
Thank you!

Predicting Medical Test Results using Driverless AI

  • 1.
    Alexander Gedranovich Chief TechnologyOfficer poder.IO linkedin.com/in/alexander-gedranovich-73847435 Predicting medical tests results using Driverless AI
  • 2.
    Outline 1. poder.IO Introduction 2.H2O at poder.IO 3. Cases for Driverless AI 4. Predicting medical tests results
  • 3.
    poder.IO Introduction Our mainproduct is cloud platform EPICA. EPICA uses AI to predict what your audience is going to do and when with a high degree of accuracy. You can use these predictions to have granular understanding of customer journeys and personalize user’s experience at individual level across web, email, social and digital advertising.
  • 4.
    H2O at poder.IO Weupdate and deploy as API 100+ models daily (POJO / MOJO) • Regression / Classification (GBM, GLM, RandomForest) • Text Classification (Word2Vec +) • Time Series Patterns (iSAX) • Deep Networks (DeepWater + Tensorflow) • Etc.
  • 5.
    Cases for DriverlessAI At the moment: 1. Driverless AI as a benchmark for all models before production 2. Research Department for handle clients’ cases Planning to use in production Q3 2018: 1. Advertising Campaigns Optimization 2. Content Classification 3. Profiles Matching 4. Look-a-like models
  • 6.
    Predicting medical testsresults Disclaimer The research was supported by Bayer AG. The project was completed by the joint team of Data Scientists from RocketScience.ai and Analytics from Bayer. Currently RocketScience.ai team is a part of poder.IO.
  • 7.
    Predicting medical testsresults: Problem The research goal is to develop an approach to predict individual medical test results based on longitudinal medical and pharma claims data without direct lab measures using data-driven analytic techniques. Such discoveries may result in improved treatment strategies. TODO: // Substitute to graphics
  • 8.
    Predicting medical testsresults: Problem • Medical laboratory test, which is required for making a decision on a patient’s treatment strategy • The test results are not available in most healthcare databases • There is a need to predict the results of the test for any patient at any point of time
  • 9.
    Predicting medical testsresults: Design
  • 10.
    Predicting medical testsresults: Design
  • 11.
    Predicting medical testsresults: Data • 10 years time interval • 11 M records • 4 M unique patients • Training data: 80% • Test data: 20% • Number of raw features: ~260
  • 12.
    Predicting medical testsresults: Prerequisites Models / methods: • ETL (C++, R, ggplot2) • H2O.ai based GLM, GBM, Random Forest • H2O.ai Driverless AI Hardware: • ETL, H2O models: 128Gb / 1Tb / 32 cores • Driverless AI: AWS g3.8xlarge
  • 13.
    Predicting medical testsresults: Outcome Model Training time RMSE R2 MAE Top features GLM (ElasticNet) 00:13:20 16.477 0.5540 13.3785 100% original GBM 100% original Random Forest 100% original Ensemble (3 models) - Ensemble (9 models) - Driverless AI 00:55:15 15.913 0.5857 12.8999 46% original TODO: // Fill details
  • 14.
    Predicting medical testsresults: top 10 features TODO: // Insert table with top 10 feature intersection from different models
  • 15.
    Predicting medical testsresults: Surrogate model
  • 16.
    Predicting medical testsresults: Partial dependence
  • 17.