SlideShare a Scribd company logo
1 of 30
1Copyright 2018 © Qubole
Modernizing AI & ML Operations to
Rapidly Advance Healthcare
2Copyright 2018 © Qubole
INTRODUCTION
OVERVIEW OF STATE OF DATA SCIENCE IN HEALTHCARE
● COMMON CLOUD DATA OPERATIONS
● PITFALLS TO AVOID
CUSTOMER DATA SCIENCE EXAMPLES ACROSS VERTICALS
● CASE STUDIES:
○ SERVICE PROVIDERS
○ ROBOTICS
○ DRUG DISCOVERY
DEMO OF DEEP LEARNING FOR DRUG DISCOVERY
● REVIEW ML METHODOLOGY USED
● MOLECULE PREDICTIONS FROM TENSORFLOW
3Copyright 2018 © Qubole
Birth of Data Science in Life Sciences
Cholera Outbreak in 1854; London
- Prevailing Theory: Miasma Theory (Cholera was caused by bad air)
- Dr John Snow refuted Miasma Theory and came up with an idea to mark on a map of London the locations of all known
cases of death caused by Cholera. This marked the birth of “Epidemiology”
- Reference: The Ghost Map by Steven Johnson
4Copyright 2018 © Qubole
We Are Experiencing a Shift in Growth
➔ Growth rate at All Time Low
➔ Life Expectancy at All Time High
➔ Human data growing 10x faster
than business data (IDE Study)
5Copyright 2018 © Qubole
The transformational promise of
Data projects remains elusive
85%of Big Data
projects fail to
meet expectations
>70%of Analytics
potential value is
unrealised
Copyright 2018 © Qubole
6Copyright 2018 © Qubole
Big Data Use Cases in Healthcare
BioTech
➔ Robotics (e.g. surgical
devices)
➔ Human microbiome
Copyright 2018 © Qubole
Pharma
➔ Drug discovery
➔ Supply chain
distribution
Life Sciences
➔ RNA Sequencing
➔ Genomics
(PheWAS/GWAS)
Providers
➔ Patient treatment
➔ Doctor & Insurance
matching
7Copyright 2018 © Qubole
Copyright 2018 © Qubole
Data Ingest &
Preparation
Governance &
Security
Model Build Deploy & Monitor
Tasks: wrangling,
exploration, validation
Tasks: split data, model
specification, feature
selection
Tasks: Train, Visualize,
compare / choose models,
model report
Tasks: build, compile/JAR,
reporting dashboard,
monitor
8Copyright 2018 © Qubole
Convergence of Data Science
Operations in Healthcare -
Information and
compute power is
rapidly growing day
by day
Machine Learning
has become more
accessible than
ever
Copyright 2018 © Qubole
Enable self-service
data analytics for
bottoms-up use
cases
9Copyright 2018 © Qubole
Survival Means Embracing Change
10Copyright 2018 © Qubole Copyright 2018 © Qubole
ETL, Ad Hoc
Modern Healthcare Big Data Operations
Apache Airflow
Device
Sensors
Historical
Behavior
Genomic
Data
Client
Information
Stream
11Copyright 2018 © Qubole
AUTOSCALING BIG DATA ENGINES IN CLOUD
12Copyright 2018 © Qubole
ON-PREMISE DATA SCIENCE APPROACH VS. CLOUD
• Impossible to scale storage without scaling
compute leading to expensive deployments
• Difficult to share HDFS data across Operating
Units
• Compute & Storage Separate
• Data is easily shared across Operating Units &
accessed from different locations
Cloud
Object
Store
DATA LOCALITY NO DATA LOCALITY
Higher Upfront Cost
No Autoscaling
Having to Fit Data
in Fixed
Infrastructure
Fewer Analytics
Tools
Lower Cost
More Iterative
Scalable with
Automation
Fast Data and ML
Tool Access
13Copyright 2018 © Qubole
PITFALLS TO AVOID WHEN STARTING DATA SCIENCE
Over Building
vs. Delivering
Use services that save
time and money
upfront. Avoid risk of
internal development
delays
Keeping Users
Siloed from Data
Allow users to have
access to data and
analytics, in order to
enable collaboration
Data Governance
& Security
Successful Data Lakes
focus on proper data
policies and
anonymization to
ensure security
Copyright 2017 © Qubole
COMPLIANCE IS POSSIBLE WITH DATA SERVICES
AVRO,
Text,
etc.
AVRO,
Parquet
Raw
(Unstructured)
Derived
(Data Governance)
Source of Truth
(Semi-Structured)
PARQUET,
ORC
Data
Services
Insert/Update/Delete
Export CSV JSON
Relational Data
Warehouses
Data Marts
(Databases,
NoSQL, etc.)
TensorFlow
Deep Learning
Use
Cases
Analytics
(i.e. Historical
Analytics, BI, User
insights etc.)
Data Sources
(i.e., Genomics, drug
behavior, patient data
etc.)
Data Science
(i.e. Time-series
Analysis, Research, etc.)
Data Discovery
Keras
Notebook
Cloud
Compute
Object
Storage
ELT
ETL
15Copyright 2018 © Qubole
Data Science Workflow - Team Data Science Process
(TDSP)
16Copyright 2018 © Qubole
How did they do it?
1
6
Copyright 2018 © Qubole
POLL QUESTION
17Copyright 2018 © Qubole
ENABLING ADVANCED ANALYTICS TO HEALTHCARE
Personas Consumers Use Cases Engines Cloud
Data Engineering
Data Science
Data Analysts
Clinical History
Clinical Trials
Drug Behavior
Insurance Payer
EMR/EHR Data
Spark
Hive
AWSPharma &
Life Sciences
Companies
Insurance Providers
SOLUTION MOVING TO QUBOLE & CLOUD
● Needed a technology to blend multiple
datasets from pharma & life sciences
companies in order to provide in-depth
reports
● Was able to build platform that could
be HIPAA compliant yet service dozens
of top healthcare companies
ABOUT: (company name private)
● Fortune 500 Healthcare Services
company
● Leading global provider of information,
innovative technology solutions and
contract research services helping
healthcare clients find better solutions
for patients.
18Copyright 2018 © Qubole
INNOVATION IN HEALTHCARE WITH ML
Personas Use Cases Engines Cloud
Data Engineering
Data Science
Data Analysts
Robotics
Sensor analysis
New product R&D
Spark
Hive
TensorFlow
AirFlow
AWSProduct &
Engineering Teams
Internal Systems
● Quickly deliver a data lake operation in
parallel with developing new robotics service
● Saw immediate cost savings on existing cloud
investments, which allowed the company to
focus on new product lines
● Able to build maintenance schedules for the
devices
● Data Science focusing on R&D use cases
OUTCOMES
Consumers
19Copyright 2018 © Qubole
How did they do it?
1
9
Copyright 2018 © Qubole
Drug Discovery for Life Sciences
Deep Learning Example Distributed with Tensorflow
20Copyright 2018 © Qubole
How did they do it?
2
0
Copyright 2018 © Qubole
Machine Learning Introduction
• Input Data has known labels
(a.k.a. Training Data)
• Example Problems
• Classification
• Regression
• Example Algorithms
• Logistic Regression
• Decision Trees
Supervised
Learning
• Input Data has no labels
• Example Problems
• Clustering
• Dimensionality Reduction
• Example Algorithms
• K-Means
• Principal Component
Analysis (PCA)
Unsupervised
Learning
• Input data is a mixture of
labelled and unlabelled.
• Example Problems
• Classification
• Tracking & Navigation
• Example Algorithms
• Hidden Markov Model
• Kalman Filter
Semi-Supervised
Learning
21Copyright 2018 © Qubole
Copyright 2018 © Qubole
Deep Learning Introduction
Pedestrian Car Motorcycle Truck
+
1
+
1
+
1
Pedestrian
Car
Motorcycle
Truck
Bias
unit
+
1
Copyright 2017 © Qubole
Data Science for Drug Discovery
● How accurately can we predict the efficacy and safety of new drugs?
● Predict molecular activity based on a molecule’s chemical structure.
Molecule Training Set
New Molecule
Machine Learning model Molecular Properties
https://www.kaggle.com/c/MerckActivity
23Copyright 2018 © Qubole
Technologies
Python-friendly open source library for numerical computation that
makes machine learning faster and easier. TensorFlow can train
and run deep neural networks & supports production prediction at
scale.
Keras is a high-level neural networks API, written in Python and
capable of running on top of TensorFlow, CNTK, or Theano.
Data processing engine.
Cloud-Native Data Platform for Self-Service AI, Machine Learning, and
Analytics.
24Copyright 2018 © Qubole
How did they do it?
2
4
Copyright 2018 © Qubole
POLL QUESTION
25Copyright 2018 © Qubole
DEMO
26Copyright 2018 © Qubole
Sign up at www.qubole.com
Test Drive more examples on Qubole
www.Qubole.com
Copyright 2017 © Qubole
STATE OF BIG DATA ADOPTION
Copyright 2018 © Qubole
•
•
•
•
•
•
•
•
•
•
•
•
1ST
STAGE
2ND
STAGE
3RD
STAGE
4TH
STAGE
5TH
STAGE
28Copyright 2018 © Qubole
DATA SCIENCE REQUIRES SCALABLE BIG DATA
DATA CLOUD
50%savings in
cloud spend
1:65DataOps : Users
10Xincrease in
IoT data
Copyright 2017 © Qubole
Modern ML Workflow is a Scientific Approach
Copyright 2018 © Qubole
•
•
•
•
•
•
•
•
•
•
•
•
•
•
1ST
STAGE
2ND
STAGE
3RD
STAGE
4TH
STAGE
5TH
STAGE
30Copyright 2018 © Qubole
Healthcare Services Company
Personas Access Use Cases Engines Cloud
Data Engineering
Data Science
Machine Learning
Campaign Reports
Email analytics
Fraud detection
Spark
Hive
TensorFlow
AirFlow
AWSMarketing
Revenue
Management
Finance
Commercial teams
● Data Science teams are able scale their
products individually (rather than having one
shared multi-tenant environment)
● Saw immediate cost savings on existing cloud
investments, which allowed the company to
focus on R&D
● Able to go-to-market with new Data Science
products in 1-3 months
● Mitigate SLA delays on analytics reports
OUTCOMES

More Related Content

What's hot

Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesDataWorks Summit
 
HPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Andy Moore
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...Fujitsu Middle East
 
Real-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormReal-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormDataWorks Summit
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingPetteriTeikariPhD
 
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk Ellen Friedman
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPCinside-BigData.com
 
Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big dataAyad Shammout
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
 
HCL Infotech enables One of India's High-Tech Research Centre Setup
HCL Infotech enables One of India's High-Tech Research Centre SetupHCL Infotech enables One of India's High-Tech Research Centre Setup
HCL Infotech enables One of India's High-Tech Research Centre SetupHCL Infotech Ltd.
 
Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Ellen Friedman
 
HPC Top 5 Stories: January 12, 2018
HPC Top 5 Stories: January 12, 2018HPC Top 5 Stories: January 12, 2018
HPC Top 5 Stories: January 12, 2018NVIDIA
 
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v0213 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v02Erin Kerrigan
 
High Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveHigh Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveIntel IT Center
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithmMasahiko Umeno
 

What's hot (20)

Proof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-seriesProof of Concept for Hadoop: storage and analytics of electrical time-series
Proof of Concept for Hadoop: storage and analytics of electrical time-series
 
HPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the WorldHPCC Systems - Using Big Data to Help Feed the World
HPCC Systems - Using Big Data to Help Feed the World
 
Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4Private Cloud Delivers Big Data in Oil & Gas v4
Private Cloud Delivers Big Data in Oil & Gas v4
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...Experience Big Data Analytics use cases ranging from cancer research to IoT a...
Experience Big Data Analytics use cases ranging from cancer research to IoT a...
 
Real-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with StormReal-time Energy Data Analytics with Storm
Real-time Energy Data Analytics with Storm
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Efficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular ImagingEfficient Data Labelling for Ocular Imaging
Efficient Data Labelling for Ocular Imaging
 
Perspective on HPC-enabled AI
Perspective on HPC-enabled AIPerspective on HPC-enabled AI
Perspective on HPC-enabled AI
 
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
What Makes Machine Learning Work? Berlin Buzzwords 2018 #bbuzz talk
 
DataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven OrganizationsDataOps: An Agile Method for Data-Driven Organizations
DataOps: An Agile Method for Data-Driven Organizations
 
Optalysys Optical Processing for HPC
Optalysys Optical Processing for HPCOptalysys Optical Processing for HPC
Optalysys Optical Processing for HPC
 
Ensuring compliance of patient data with big data
Ensuring compliance of patient data with big dataEnsuring compliance of patient data with big data
Ensuring compliance of patient data with big data
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
HCL Infotech enables One of India's High-Tech Research Centre Setup
HCL Infotech enables One of India's High-Tech Research Centre SetupHCL Infotech enables One of India's High-Tech Research Centre Setup
HCL Infotech enables One of India's High-Tech Research Centre Setup
 
Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101Digital Transformation - #StrataData London 2017 - Data101
Digital Transformation - #StrataData London 2017 - Data101
 
HPC Top 5 Stories: January 12, 2018
HPC Top 5 Stories: January 12, 2018HPC Top 5 Stories: January 12, 2018
HPC Top 5 Stories: January 12, 2018
 
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v0213 2792 big-data_keynote_presentation_finalpass_05_d_v02
13 2792 big-data_keynote_presentation_finalpass_05_d_v02
 
High Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming waveHigh Performance Computing and Big Data: The coming wave
High Performance Computing and Big Data: The coming wave
 
Master the RETE algorithm
Master the RETE algorithmMaster the RETE algorithm
Master the RETE algorithm
 

Similar to Modern ML & AI Operations to Advance Healthcare

Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Holden Ackerman
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareSkillspeed
 
A Reference Architecture for Digitalization in the Pharmaceutical Industry
A Reference Architecture for Digitalization in the Pharmaceutical IndustryA Reference Architecture for Digitalization in the Pharmaceutical Industry
A Reference Architecture for Digitalization in the Pharmaceutical IndustryCapgemini
 
Cloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovationCloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovationChristian Verstraete
 
Digital Transformation in the Lab
Digital Transformation in the LabDigital Transformation in the Lab
Digital Transformation in the Labaccenture
 
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...Anup Kale
 
The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)Gerardo Pardo-Castellote
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowMapR Technologies
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...Jürgen Ambrosi
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudJuarez Junior
 
Defining a Digitalization Reference Architecture for the Pharma Industry
Defining a Digitalization Reference Architecture for the Pharma IndustryDefining a Digitalization Reference Architecture for the Pharma Industry
Defining a Digitalization Reference Architecture for the Pharma IndustryCapgemini
 
32672 hplabs overviewnn
32672 hplabs overviewnn32672 hplabs overviewnn
32672 hplabs overviewnngmazuel
 
Big data vendor panel - MarkLogic
Big data vendor panel - MarkLogicBig data vendor panel - MarkLogic
Big data vendor panel - MarkLogicMikan Associates
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow VMware Tanzu
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in EnterpriseJosh Yeh
 

Similar to Modern ML & AI Operations to Advance Healthcare (20)

Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI Top Trends in Building Data Lakes for Machine Learning and AI
Top Trends in Building Data Lakes for Machine Learning and AI
 
BIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in HealthcareBIG Data & Hadoop Applications in Healthcare
BIG Data & Hadoop Applications in Healthcare
 
A Reference Architecture for Digitalization in the Pharmaceutical Industry
A Reference Architecture for Digitalization in the Pharmaceutical IndustryA Reference Architecture for Digitalization in the Pharmaceutical Industry
A Reference Architecture for Digitalization in the Pharmaceutical Industry
 
Oracle big data publix sector 1
Oracle big data publix sector 1Oracle big data publix sector 1
Oracle big data publix sector 1
 
Cloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovationCloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovation
 
Digital Transformation in the Lab
Digital Transformation in the LabDigital Transformation in the Lab
Digital Transformation in the Lab
 
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
COVID-19 Series: Machine Learning, AI and Automation in Current Healthcare De...
 
The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)The Platform for the Industrial Internet of Things (IIoT)
The Platform for the Industrial Internet of Things (IIoT)
 
Data Warehouse Evolution Roadshow
Data Warehouse Evolution RoadshowData Warehouse Evolution Roadshow
Data Warehouse Evolution Roadshow
 
TIAD : Automation day by Jerôme Labat
TIAD : Automation day by Jerôme LabatTIAD : Automation day by Jerôme Labat
TIAD : Automation day by Jerôme Labat
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...1° Sessione Oracle CRUI: Analytics Data Lab,  the power of Big Data Investiga...
1° Sessione Oracle CRUI: Analytics Data Lab, the power of Big Data Investiga...
 
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science CloudArtificial Intelligence and Machine Learning with the Oracle Data Science Cloud
Artificial Intelligence and Machine Learning with the Oracle Data Science Cloud
 
Defining a Digitalization Reference Architecture for the Pharma Industry
Defining a Digitalization Reference Architecture for the Pharma IndustryDefining a Digitalization Reference Architecture for the Pharma Industry
Defining a Digitalization Reference Architecture for the Pharma Industry
 
32672 hplabs overviewnn
32672 hplabs overviewnn32672 hplabs overviewnn
32672 hplabs overviewnn
 
Big data vendor panel - MarkLogic
Big data vendor panel - MarkLogicBig data vendor panel - MarkLogic
Big data vendor panel - MarkLogic
 
Pivotal Big Data Roadshow
Pivotal Big Data Roadshow Pivotal Big Data Roadshow
Pivotal Big Data Roadshow
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
Data Science in Enterprise
Data Science in EnterpriseData Science in Enterprise
Data Science in Enterprise
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 

Modern ML & AI Operations to Advance Healthcare

  • 1. 1Copyright 2018 © Qubole Modernizing AI & ML Operations to Rapidly Advance Healthcare
  • 2. 2Copyright 2018 © Qubole INTRODUCTION OVERVIEW OF STATE OF DATA SCIENCE IN HEALTHCARE ● COMMON CLOUD DATA OPERATIONS ● PITFALLS TO AVOID CUSTOMER DATA SCIENCE EXAMPLES ACROSS VERTICALS ● CASE STUDIES: ○ SERVICE PROVIDERS ○ ROBOTICS ○ DRUG DISCOVERY DEMO OF DEEP LEARNING FOR DRUG DISCOVERY ● REVIEW ML METHODOLOGY USED ● MOLECULE PREDICTIONS FROM TENSORFLOW
  • 3. 3Copyright 2018 © Qubole Birth of Data Science in Life Sciences Cholera Outbreak in 1854; London - Prevailing Theory: Miasma Theory (Cholera was caused by bad air) - Dr John Snow refuted Miasma Theory and came up with an idea to mark on a map of London the locations of all known cases of death caused by Cholera. This marked the birth of “Epidemiology” - Reference: The Ghost Map by Steven Johnson
  • 4. 4Copyright 2018 © Qubole We Are Experiencing a Shift in Growth ➔ Growth rate at All Time Low ➔ Life Expectancy at All Time High ➔ Human data growing 10x faster than business data (IDE Study)
  • 5. 5Copyright 2018 © Qubole The transformational promise of Data projects remains elusive 85%of Big Data projects fail to meet expectations >70%of Analytics potential value is unrealised Copyright 2018 © Qubole
  • 6. 6Copyright 2018 © Qubole Big Data Use Cases in Healthcare BioTech ➔ Robotics (e.g. surgical devices) ➔ Human microbiome Copyright 2018 © Qubole Pharma ➔ Drug discovery ➔ Supply chain distribution Life Sciences ➔ RNA Sequencing ➔ Genomics (PheWAS/GWAS) Providers ➔ Patient treatment ➔ Doctor & Insurance matching
  • 7. 7Copyright 2018 © Qubole Copyright 2018 © Qubole Data Ingest & Preparation Governance & Security Model Build Deploy & Monitor Tasks: wrangling, exploration, validation Tasks: split data, model specification, feature selection Tasks: Train, Visualize, compare / choose models, model report Tasks: build, compile/JAR, reporting dashboard, monitor
  • 8. 8Copyright 2018 © Qubole Convergence of Data Science Operations in Healthcare - Information and compute power is rapidly growing day by day Machine Learning has become more accessible than ever Copyright 2018 © Qubole Enable self-service data analytics for bottoms-up use cases
  • 9. 9Copyright 2018 © Qubole Survival Means Embracing Change
  • 10. 10Copyright 2018 © Qubole Copyright 2018 © Qubole ETL, Ad Hoc Modern Healthcare Big Data Operations Apache Airflow Device Sensors Historical Behavior Genomic Data Client Information Stream
  • 11. 11Copyright 2018 © Qubole AUTOSCALING BIG DATA ENGINES IN CLOUD
  • 12. 12Copyright 2018 © Qubole ON-PREMISE DATA SCIENCE APPROACH VS. CLOUD • Impossible to scale storage without scaling compute leading to expensive deployments • Difficult to share HDFS data across Operating Units • Compute & Storage Separate • Data is easily shared across Operating Units & accessed from different locations Cloud Object Store DATA LOCALITY NO DATA LOCALITY Higher Upfront Cost No Autoscaling Having to Fit Data in Fixed Infrastructure Fewer Analytics Tools Lower Cost More Iterative Scalable with Automation Fast Data and ML Tool Access
  • 13. 13Copyright 2018 © Qubole PITFALLS TO AVOID WHEN STARTING DATA SCIENCE Over Building vs. Delivering Use services that save time and money upfront. Avoid risk of internal development delays Keeping Users Siloed from Data Allow users to have access to data and analytics, in order to enable collaboration Data Governance & Security Successful Data Lakes focus on proper data policies and anonymization to ensure security
  • 14. Copyright 2017 © Qubole COMPLIANCE IS POSSIBLE WITH DATA SERVICES AVRO, Text, etc. AVRO, Parquet Raw (Unstructured) Derived (Data Governance) Source of Truth (Semi-Structured) PARQUET, ORC Data Services Insert/Update/Delete Export CSV JSON Relational Data Warehouses Data Marts (Databases, NoSQL, etc.) TensorFlow Deep Learning Use Cases Analytics (i.e. Historical Analytics, BI, User insights etc.) Data Sources (i.e., Genomics, drug behavior, patient data etc.) Data Science (i.e. Time-series Analysis, Research, etc.) Data Discovery Keras Notebook Cloud Compute Object Storage ELT ETL
  • 15. 15Copyright 2018 © Qubole Data Science Workflow - Team Data Science Process (TDSP)
  • 16. 16Copyright 2018 © Qubole How did they do it? 1 6 Copyright 2018 © Qubole POLL QUESTION
  • 17. 17Copyright 2018 © Qubole ENABLING ADVANCED ANALYTICS TO HEALTHCARE Personas Consumers Use Cases Engines Cloud Data Engineering Data Science Data Analysts Clinical History Clinical Trials Drug Behavior Insurance Payer EMR/EHR Data Spark Hive AWSPharma & Life Sciences Companies Insurance Providers SOLUTION MOVING TO QUBOLE & CLOUD ● Needed a technology to blend multiple datasets from pharma & life sciences companies in order to provide in-depth reports ● Was able to build platform that could be HIPAA compliant yet service dozens of top healthcare companies ABOUT: (company name private) ● Fortune 500 Healthcare Services company ● Leading global provider of information, innovative technology solutions and contract research services helping healthcare clients find better solutions for patients.
  • 18. 18Copyright 2018 © Qubole INNOVATION IN HEALTHCARE WITH ML Personas Use Cases Engines Cloud Data Engineering Data Science Data Analysts Robotics Sensor analysis New product R&D Spark Hive TensorFlow AirFlow AWSProduct & Engineering Teams Internal Systems ● Quickly deliver a data lake operation in parallel with developing new robotics service ● Saw immediate cost savings on existing cloud investments, which allowed the company to focus on new product lines ● Able to build maintenance schedules for the devices ● Data Science focusing on R&D use cases OUTCOMES Consumers
  • 19. 19Copyright 2018 © Qubole How did they do it? 1 9 Copyright 2018 © Qubole Drug Discovery for Life Sciences Deep Learning Example Distributed with Tensorflow
  • 20. 20Copyright 2018 © Qubole How did they do it? 2 0 Copyright 2018 © Qubole Machine Learning Introduction • Input Data has known labels (a.k.a. Training Data) • Example Problems • Classification • Regression • Example Algorithms • Logistic Regression • Decision Trees Supervised Learning • Input Data has no labels • Example Problems • Clustering • Dimensionality Reduction • Example Algorithms • K-Means • Principal Component Analysis (PCA) Unsupervised Learning • Input data is a mixture of labelled and unlabelled. • Example Problems • Classification • Tracking & Navigation • Example Algorithms • Hidden Markov Model • Kalman Filter Semi-Supervised Learning
  • 21. 21Copyright 2018 © Qubole Copyright 2018 © Qubole Deep Learning Introduction Pedestrian Car Motorcycle Truck + 1 + 1 + 1 Pedestrian Car Motorcycle Truck Bias unit + 1
  • 22. Copyright 2017 © Qubole Data Science for Drug Discovery ● How accurately can we predict the efficacy and safety of new drugs? ● Predict molecular activity based on a molecule’s chemical structure. Molecule Training Set New Molecule Machine Learning model Molecular Properties https://www.kaggle.com/c/MerckActivity
  • 23. 23Copyright 2018 © Qubole Technologies Python-friendly open source library for numerical computation that makes machine learning faster and easier. TensorFlow can train and run deep neural networks & supports production prediction at scale. Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Data processing engine. Cloud-Native Data Platform for Self-Service AI, Machine Learning, and Analytics.
  • 24. 24Copyright 2018 © Qubole How did they do it? 2 4 Copyright 2018 © Qubole POLL QUESTION
  • 25. 25Copyright 2018 © Qubole DEMO
  • 26. 26Copyright 2018 © Qubole Sign up at www.qubole.com Test Drive more examples on Qubole www.Qubole.com
  • 27. Copyright 2017 © Qubole STATE OF BIG DATA ADOPTION Copyright 2018 © Qubole • • • • • • • • • • • • 1ST STAGE 2ND STAGE 3RD STAGE 4TH STAGE 5TH STAGE
  • 28. 28Copyright 2018 © Qubole DATA SCIENCE REQUIRES SCALABLE BIG DATA DATA CLOUD 50%savings in cloud spend 1:65DataOps : Users 10Xincrease in IoT data
  • 29. Copyright 2017 © Qubole Modern ML Workflow is a Scientific Approach Copyright 2018 © Qubole • • • • • • • • • • • • • • 1ST STAGE 2ND STAGE 3RD STAGE 4TH STAGE 5TH STAGE
  • 30. 30Copyright 2018 © Qubole Healthcare Services Company Personas Access Use Cases Engines Cloud Data Engineering Data Science Machine Learning Campaign Reports Email analytics Fraud detection Spark Hive TensorFlow AirFlow AWSMarketing Revenue Management Finance Commercial teams ● Data Science teams are able scale their products individually (rather than having one shared multi-tenant environment) ● Saw immediate cost savings on existing cloud investments, which allowed the company to focus on R&D ● Able to go-to-market with new Data Science products in 1-3 months ● Mitigate SLA delays on analytics reports OUTCOMES