SlideShare a Scribd company logo
Transitioning from
Traditional DW to Spark
in OR Predictive Modeling
Ayad Shammoutand Denny Lee
October 21st,2015
About Ayad Shammout
• Director of BusinessIntelligence,Beth IsraelDeaconess
Medical Center
• Helped build BusinessIntelligence, highlyavailable /
disaster recoveryinfrastructure for BIDMC
2
About Denny Lee
• TechnologyEvangelist, Databricks
• Former Sr. Director of Data SciencesEng, Concur
• Helped bring Hadoop onto Windows and Azure
3
We are Databricks, the company behind Spark
Founded by the creators of
Apache Spark in 2013
Share of Spark code
contributed by Databricks
in 2014
75%
4
Data Value
Created Databricks on top of Spark to make big data simple.
Why is Operating Room Scheduling
Predictive Modeling Important?
6
$15-$20 / minute for a
basic surgicalprocedure
Time is an OR's most valuable resource
Lack of OR availability
means loss of patient
OR efficiency differs depending on the
OR staffing and allocation (8, 10, 13, or 16h),
not the workload (i.e. cases)
7
“You are notgoing to getthe elephantto shrink or change itssize. You
need to face the fact that the elephantis8 OR tall and 11hrwide”
Steven Shafer, MD
8
Operating Room
Better utilization =
Better profit margins
Reduce support and
maintenance costs
Medical Staff
Better utilization =
Better profit margins
Better medical staff
efficiencies= Better
outcomes
Patients
Shorter wait times
and lesscancellations
Better medical staff
efficiencies= Better
outcomes
Develop Predictive Model
• Develop a predictive model that would identify
available OR time 15 business days in advance.
• Allowus to confirm wait list cases two weeks in
advance, instead of when the blocks normally release
four days out.
9
Forecast OR Schedule
• Case load 15 businessdays in advance
• Book more cases weeks in advance to prevent under-
utilization
• Reduce staff overtime and idle time
10
Background
• Threesurgical groups
• GYN, urology, generalsurgery,colorectal, surgical
oncology
• Eyes, plastics, ENT
• Orthopedics, podiatry
• Currentlybuilt using SQL ServerData Mining
11
Using Traditional Data Warehousing
Techniques
OR DW
SSAS Data
Mining
Data
Sources
OR
Reports
Traditional Data Warehousing & Data Mining
OR Predictive Model
Process mining model
every3 hours
OR Prediction DB
Data inserts every
3 hours
Predictionresults
14
Original Design
• Multiple data sourcespushing data into SQL Server and SQL
ServerAnalysis ServerData Mining
• Hand built 225 different DM modules (5 days, 15 businessdays
ahead, 3 differentgroups)
• Pipeline processhad to run225 times / day (3 pools x 75
modules)
15
Regression Calculations
SSAS Data Mining T-SQL Code
Intercept R2
Mean Adjusted R2
Coefficients Standard Deviation
Variance Standard Error
Taking advantage of Spark’s DW
Capabilities and MLlib
OR DWData
Sources
OR
Reports
OR Predictive Model in Spark
Data inserts every
3 hours
18
demoOR Block Scheduling
Extract History data and run linear regression with SGD
with multiple variables
19
21
22
23
24
25
26
27
OR Schedule Report (example)
28
Why the model is working
• Can coordinate waitlist schedulinglogistics with physicians and
patients within two weeks of the surgery
• Plan staff schedulingand resourcessothere are less last-minute
staffing issuesfor nursing and anesthesia
• Utilization metrics are showing us where we can maximize our
elective surgicalscheduleand level demand
Key Learnings when Migrating from
Traditional DW to Spark
30
Transitioning to the Cloud
Beth Israel DeaconessMedical Center is increasingly moving to cloud
infrastructure services with the hopesof closing itsdata center when the
hospital's lease is up in the next five years. CIO John Halamka says he's
decommissioning HP and Dell servers as he movesmore of hiscompute
workloads to Amazon Web Services, where he's currently using 30 virtual
machines to test and develop new applications. "It is no longer cost
effective to deal with server hosting ourselves because our challenge isn't
real estate, it's power and cooling," he says.
31
Transitioning to the Cloud
• Need time for engineers,analysts, and data scientists to learn
how to build for the cloud
• Build for security right from start – processheavy, a lot of
documentation, audits / reviews
• Differentiating data engineersand engineers(REST APIs,
services, elasticity, etc.)
32
Transitioning to Spark
• No more stored proceduresor indexes
• Good for Spark SQL, services design
• Prototype, prototype, prototype
• Leverage existing languagesand skill sets
• Leverage the MOOCsand other Spark training
• Break down the silos of data engineers,engineers,data
scientists, and analysts
33
Transitioning DW to Spark
• Understand Partitioning,Broadcast Joins, and Parquet
• Not all Hive functionsare available in Spark (99%of the time that is okay) due
to Hive context
• Don’t limit yourselfto build star-schemas / snowflake schemas
• Expand outside of traditional DW: machine learning,streaming
Thank you.
For more information, please contact
ayad.shammout@hotmail.com
denny@databricks.com

More Related Content

What's hot

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Databricks
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
Databricks
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
Databricks
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Databricks
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Spark Summit
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
Databricks
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Databricks
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
Spark Summit
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
Databricks
 
Koalas: Pandas on Apache Spark
Koalas: Pandas on Apache SparkKoalas: Pandas on Apache Spark
Koalas: Pandas on Apache Spark
Databricks
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
Sri Ambati
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
Databricks
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
Modern Data Stack France
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Spark Summit
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
Jen Aman
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Spark Summit
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
Modern Data Stack France
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
Dremio Corporation
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWes McKinney
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 

What's hot (20)

Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
Spark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science LondonSpark Under the Hood - Meetup @ Data Science London
Spark Under the Hood - Meetup @ Data Science London
 
New Developments in Spark
New Developments in SparkNew Developments in Spark
New Developments in Spark
 
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
Not your Father's Database: Not Your Father’s Database: How to Use Apache® Sp...
 
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
 
New Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit EastNew Directions for Spark in 2015 - Spark Summit East
New Directions for Spark in 2015 - Spark Summit East
 
Spark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with SparkSpark Summit 2015 keynote: Making Big Data Simple with Spark
Spark Summit 2015 keynote: Making Big Data Simple with Spark
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Koalas: Pandas on Apache Spark
Koalas: Pandas on Apache SparkKoalas: Pandas on Apache Spark
Koalas: Pandas on Apache Spark
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
HUG France Feb 2016 - Migration de données structurées entre Hadoop et RDBMS ...
 
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
Apache Spark for Machine Learning with High Dimensional Labels: Spark Summit ...
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald NowlingInsights into Customer Behavior from Clickstream Data by Ronald Nowling
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
 
Hugfr SPARK & RIAK -20160114_hug_france
Hugfr  SPARK & RIAK -20160114_hug_franceHugfr  SPARK & RIAK -20160114_hug_france
Hugfr SPARK & RIAK -20160114_hug_france
 
Building a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache ArrowBuilding a Virtual Data Lake with Apache Arrow
Building a Virtual Data Lake with Apache Arrow
 
What's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial usersWhat's new in pandas and the SciPy stack for financial users
What's new in pandas and the SciPy stack for financial users
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 

Similar to Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Eric Kavanagh
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
Cloudera, Inc.
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
Ming Yuan
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Databricks
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
Databricks
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
Precisely
 
Houd controle over uw data
Houd controle over uw dataHoud controle over uw data
Houd controle over uw data
ICT-Partners
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
South West Data Meetup
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoopDr. Wilfred Lin (Ph.D.)
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
DATAVERSITY
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
ClouderaUserGroups
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Denodo
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
JC Davis
 

Similar to Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling (20)

Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal ForecastingBest Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
Best Laid Plans: Saving Time, Money and Trouble with Optimal Forecasting
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
From Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your OrganizationFrom Insight to Action: Using Data Science to Transform Your Organization
From Insight to Action: Using Data Science to Transform Your Organization
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
 
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
Reimagining Devon Energy’s Data Estate with a Unified Approach to Integration...
 
Cloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an EcosystemCloud and Analytics - From Platforms to an Ecosystem
Cloud and Analytics - From Platforms to an Ecosystem
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Houd controle over uw data
Houd controle over uw dataHoud controle over uw data
Houd controle over uw data
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Ask bigger questions
Ask bigger questionsAsk bigger questions
Ask bigger questions
 
6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop6 enriching your data warehouse with big data and hadoop
6 enriching your data warehouse with big data and hadoop
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
What it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready stateWhat it takes to bring Hadoop to a production-ready state
What it takes to bring Hadoop to a production-ready state
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
AhmedHussein950959
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
Kerry Sado
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 

Recently uploaded (20)

ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
ASME IX(9) 2007 Full Version .pdf
ASME IX(9)  2007 Full Version       .pdfASME IX(9)  2007 Full Version       .pdf
ASME IX(9) 2007 Full Version .pdf
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Hierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power SystemHierarchical Digital Twin of a Naval Power System
Hierarchical Digital Twin of a Naval Power System
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 

Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling

  • 1. Transitioning from Traditional DW to Spark in OR Predictive Modeling Ayad Shammoutand Denny Lee October 21st,2015
  • 2. About Ayad Shammout • Director of BusinessIntelligence,Beth IsraelDeaconess Medical Center • Helped build BusinessIntelligence, highlyavailable / disaster recoveryinfrastructure for BIDMC 2
  • 3. About Denny Lee • TechnologyEvangelist, Databricks • Former Sr. Director of Data SciencesEng, Concur • Helped bring Hadoop onto Windows and Azure 3
  • 4. We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Share of Spark code contributed by Databricks in 2014 75% 4 Data Value Created Databricks on top of Spark to make big data simple.
  • 5. Why is Operating Room Scheduling Predictive Modeling Important?
  • 6. 6 $15-$20 / minute for a basic surgicalprocedure Time is an OR's most valuable resource Lack of OR availability means loss of patient OR efficiency differs depending on the OR staffing and allocation (8, 10, 13, or 16h), not the workload (i.e. cases)
  • 7. 7 “You are notgoing to getthe elephantto shrink or change itssize. You need to face the fact that the elephantis8 OR tall and 11hrwide” Steven Shafer, MD
  • 8. 8 Operating Room Better utilization = Better profit margins Reduce support and maintenance costs Medical Staff Better utilization = Better profit margins Better medical staff efficiencies= Better outcomes Patients Shorter wait times and lesscancellations Better medical staff efficiencies= Better outcomes
  • 9. Develop Predictive Model • Develop a predictive model that would identify available OR time 15 business days in advance. • Allowus to confirm wait list cases two weeks in advance, instead of when the blocks normally release four days out. 9
  • 10. Forecast OR Schedule • Case load 15 businessdays in advance • Book more cases weeks in advance to prevent under- utilization • Reduce staff overtime and idle time 10
  • 11. Background • Threesurgical groups • GYN, urology, generalsurgery,colorectal, surgical oncology • Eyes, plastics, ENT • Orthopedics, podiatry • Currentlybuilt using SQL ServerData Mining 11
  • 12. Using Traditional Data Warehousing Techniques
  • 13. OR DW SSAS Data Mining Data Sources OR Reports Traditional Data Warehousing & Data Mining OR Predictive Model Process mining model every3 hours OR Prediction DB Data inserts every 3 hours Predictionresults
  • 14. 14 Original Design • Multiple data sourcespushing data into SQL Server and SQL ServerAnalysis ServerData Mining • Hand built 225 different DM modules (5 days, 15 businessdays ahead, 3 differentgroups) • Pipeline processhad to run225 times / day (3 pools x 75 modules)
  • 15. 15 Regression Calculations SSAS Data Mining T-SQL Code Intercept R2 Mean Adjusted R2 Coefficients Standard Deviation Variance Standard Error
  • 16. Taking advantage of Spark’s DW Capabilities and MLlib
  • 17. OR DWData Sources OR Reports OR Predictive Model in Spark Data inserts every 3 hours
  • 18. 18 demoOR Block Scheduling Extract History data and run linear regression with SGD with multiple variables
  • 19. 19
  • 20.
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 26. 26
  • 28. 28 Why the model is working • Can coordinate waitlist schedulinglogistics with physicians and patients within two weeks of the surgery • Plan staff schedulingand resourcessothere are less last-minute staffing issuesfor nursing and anesthesia • Utilization metrics are showing us where we can maximize our elective surgicalscheduleand level demand
  • 29. Key Learnings when Migrating from Traditional DW to Spark
  • 30. 30 Transitioning to the Cloud Beth Israel DeaconessMedical Center is increasingly moving to cloud infrastructure services with the hopesof closing itsdata center when the hospital's lease is up in the next five years. CIO John Halamka says he's decommissioning HP and Dell servers as he movesmore of hiscompute workloads to Amazon Web Services, where he's currently using 30 virtual machines to test and develop new applications. "It is no longer cost effective to deal with server hosting ourselves because our challenge isn't real estate, it's power and cooling," he says.
  • 31. 31 Transitioning to the Cloud • Need time for engineers,analysts, and data scientists to learn how to build for the cloud • Build for security right from start – processheavy, a lot of documentation, audits / reviews • Differentiating data engineersand engineers(REST APIs, services, elasticity, etc.)
  • 32. 32 Transitioning to Spark • No more stored proceduresor indexes • Good for Spark SQL, services design • Prototype, prototype, prototype • Leverage existing languagesand skill sets • Leverage the MOOCsand other Spark training • Break down the silos of data engineers,engineers,data scientists, and analysts
  • 33. 33 Transitioning DW to Spark • Understand Partitioning,Broadcast Joins, and Parquet • Not all Hive functionsare available in Spark (99%of the time that is okay) due to Hive context • Don’t limit yourselfto build star-schemas / snowflake schemas • Expand outside of traditional DW: machine learning,streaming
  • 34. Thank you. For more information, please contact ayad.shammout@hotmail.com denny@databricks.com