SlideShare a Scribd company logo
1 of 36
Download to read offline
Unifying Data Warehousing
with Data Lakes
Ali Ghodsi, Co-Founder & CEO
Oct 25, 2017
Many enterprises are undergoing
a data transformation
Databricks Customers Across Industries
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Health care AI cloud dataset use case
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Correlate EMR of 50,000 patients
compared with their DNA
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
Enterprise AI use case
5
Provide recommendations to sales
using NLP and deep learning
Financial	Services Healthcare	&	Pharma Media	&	Entertainment Technology
Public	Sector Retail	&	CPG		 Consumer	Services Energy	&	Industrial	IoTMarketing	&	AdTech
Data	&	Analytics	Services
6
Real-time AI use-case
Curb abusive behavior
across gamers globally
Big Data was the Missing Link for AI
BIG DATA
Customer Data
Emails/Web pages
Click Streams
Sensor data (IoT)
Video/Speech
…
Most companies are Struggling with Big Data
GREAT RESULTS
Hardest part of AI isn’t AI
“Hidden Technical Debt in Machine Learning Systems", Google NIPS 2015
The hardest part of AI is Big Data
ML	
Code
Building Predictive Applications
is really Hard!
Unified Analytics Platform
UNIFIED
EXPERIENCE
ACROSS TEAMS
UNIFIED
PROCESSING
ENGINE
The Evolution
of Big Data
The Era of the
Data Warehouse
Data Warehouse (DW)
THE GOOD
• Pristine Data
• Fast Queries
• Transactional
THE BAD
• Expensive to Scale, not Elastic
• Requires ETL, Stale Data, No Real-Time
• No Predictions, No ML
• Closed formats (lock in)
Not Future Proof – Missing Predictions, Real-time, Scale
ETL	important	data	to	central	DW	and	get	Business	Intelligence	(BI)
The Era of the
Data Lake
THE BAD
• Inconsistent Data
• Unreliable for Analytics
• Lack of Schema
• Poor Performance
Hadoop Data Lake
Become a cheap messy data store with poor performance
ETL	all	data	to	central	scalable	open	lake	for	all	use	cases
THE GOOD
• Massive scale
• Inexpensive Storage
• Open Formats (Parquet, ORC)
• Promise of ML & Real Time
Streaming
The Current State
of Data Platforms
Info Sec at a Fortune 100 Company
DISADVANTAGES OF ARCHITECTURE
• Poor agility in responding to new threats
• Scale Limitations, no historical data
• 6 Months and twenty people to build
ENTERPRISE DATA WAREHOUSE
• Only 2 weeks of data
• Very expensive to scale
• Proprietary Formats
• No Predictions (ML)
Messy data not ready for analytics
Billions of records a day HADOOP
DATA LAKE
Complex ETL
EDW
EDW
EDW
Incidence
Response
Alerting
Reports
The Next Generation
Data Platform
First UNIFIED data management system that delivers:
The
SCALE
of data lake
The
LOW-LATENCY
of streaming
The
RELIABILITY &
PERFORMANCE
of data warehouse
Announcing Databricks Delta
The
SCALE
of data lake
The
LOW-LATENCY
of streaming
The
RELIABILITY &
PERFORMANCE
of data warehouse
Databricks Delta
Enables Predictions, Real-time and Ad Hoc
Analytics at Massive Scale
THE GOOD
OF DATA LAKES
• Massive scale on Amazon S3
• Open Formats (Parquet, ORC)
• Predictions (ML) & Real Time
Streaming
THE GOOD
OF DATA WAREHOUSES
• Pristine Data
• Transactional Reliability
• Fast Queries (10-100x)
Databricks Delta Under the Hood
• Decouple Compute & Storage
• ACID Transactions & Data Validation
• Data Indexing & Caching (10-100x)
• Real-Time Streaming Ingest
MASSIVE SCALE
RELIABILITY
PERFORMANCE
LOW-LATENCY
Info Sec with Databricks Delta
DATABRICKS RUNTIME
powered by
DATABRICKS
RUNTIME
Trillion Records a Day
DATABRICKS
DELTA
ETL, Schema Validation SQL , ML, Stream
ADVANTAGES
• AI capable data warehouse at the scale of a data lake
• Interactive analysis on 2 years of data
• 2 Weeks to build with a 5 person data platform team
Unified Analytics Platform
UNIFIED
EXPERIENCE
ACROSS TEAMS
Notebooks,Dashboards,Reports
Unified Analytics Platform
+
UNIFIED
DATA
MANAGEMENT
ReliableTransactions,Performance
UNIFIED
EXPERIENCE
ACROSS TEAMS
Notebooks,Dashboards,Reports
Demo by Michael Armbrust
Evolution of a Cutting-Edge Data Pipeline
Events
?
Reporting
Streaming
Analytics
Data Lake
Evolution of a Cutting-Edge Data Pipeline
Events
Reporting
Streaming
Analytics
Data Lake
Challenge #1: Historical Queries?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
λ-arch1
1
1
Challenge #2: Messy Data?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
1
21
1
2
Reprocessing
Challenge #3: Mistakes and Failures?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
Reprocessing
Partitioned
1
2
3
1
1
3
2
Reprocessing
Challenge #4: Query Performance?
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
Reprocessing
Compaction
Partitioned
Compact
Small Files
Scheduled to
Avoid Compaction
1
2
3
1
1
2
4
4
4
2
Let’s try it instead with
DELTA
Reprocessing
The Canonical Data Pipeline
Data Lake
λ-arch
λ-arch
Streaming
Analytics
Reporting
Events
Validation
λ-arch
Validation
Reprocessing
Compaction
Partitioned
Compact
Small Files
Scheduled to
Avoid Compaction
1
2
3
1
1
2
4
4
4
2
Challenge
DELTA
DATA LAKE
Reporting
Streaming
Analytics
The
LOW-LATENCY
of streaming
The
RELIABILITY &
PERFORMANCE
of data warehouse
The
SCALE
of data lake
The Delta Architecture
Sign up for the Private Beta
visit databricks.

More Related Content

What's hot

Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
Databricks
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 

What's hot (20)

Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
The delta architecture
The delta architectureThe delta architecture
The delta architecture
 

Similar to Introducing Databricks Delta

4th Industrial Revolution
4th Industrial Revolution4th Industrial Revolution
4th Industrial Revolution
Rolando Rangel
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
Kyle Redinger
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
DATAVERSITY
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
IntelAPAC
 

Similar to Introducing Databricks Delta (20)

Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location Intelligence
 
Refactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics ProductsRefactoring your EDW with Mobile Analytics Products
Refactoring your EDW with Mobile Analytics Products
 
There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?There are 250 Database products, are you running the right one?
There are 250 Database products, are you running the right one?
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
4th Industrial Revolution
4th Industrial Revolution4th Industrial Revolution
4th Industrial Revolution
 
Introduction to Harnessing Big Data
Introduction to Harnessing Big DataIntroduction to Harnessing Big Data
Introduction to Harnessing Big Data
 
State of Big Data Markets
State of Big Data MarketsState of Big Data Markets
State of Big Data Markets
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
SAP Forum Ankara 2017 - "Verinin Merkezine Seyahat"
 
Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)Data Virtualization. An Introduction (ASEAN)
Data Virtualization. An Introduction (ASEAN)
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Bigdata
Bigdata Bigdata
Bigdata
 
Dell Digital Transformation Through AI and Data Analytics Webinar
Dell Digital Transformation Through AI and  Data Analytics WebinarDell Digital Transformation Through AI and  Data Analytics Webinar
Dell Digital Transformation Through AI and Data Analytics Webinar
 
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
Slides: Case Study — How J.B. Hunt is Driving Efficiency with AI and Real-Tim...
 
Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013Ron Kasabian - Intel Big Data & Cloud Summit 2013
Ron Kasabian - Intel Big Data & Cloud Summit 2013
 
Big data data lake and beyond
Big data data lake and beyond Big data data lake and beyond
Big data data lake and beyond
 
IBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big DataIBM Solutions Connect 2013 - Getting started with Big Data
IBM Solutions Connect 2013 - Getting started with Big Data
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
The Evolution of Data Architecture
The Evolution of Data ArchitectureThe Evolution of Data Architecture
The Evolution of Data Architecture
 

More from Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 

Recently uploaded

一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
pyhepag
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
pyhepag
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Valters Lauzums
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 

Recently uploaded (20)

Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
一比一原版加利福尼亚大学尔湾分校毕业证成绩单如何办理
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理一比一原版阿德莱德大学毕业证成绩单如何办理
一比一原版阿德莱德大学毕业证成绩单如何办理
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotecAbortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
Abortion pills in Dammam Saudi Arabia// +966572737505 // buy cytotec
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 

Introducing Databricks Delta

  • 1. Unifying Data Warehousing with Data Lakes Ali Ghodsi, Co-Founder & CEO Oct 25, 2017
  • 2. Many enterprises are undergoing a data transformation
  • 3. Databricks Customers Across Industries Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services
  • 4. Health care AI cloud dataset use case Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Correlate EMR of 50,000 patients compared with their DNA
  • 5. Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services Enterprise AI use case 5 Provide recommendations to sales using NLP and deep learning
  • 6. Financial Services Healthcare & Pharma Media & Entertainment Technology Public Sector Retail & CPG Consumer Services Energy & Industrial IoTMarketing & AdTech Data & Analytics Services 6 Real-time AI use-case Curb abusive behavior across gamers globally
  • 7. Big Data was the Missing Link for AI BIG DATA Customer Data Emails/Web pages Click Streams Sensor data (IoT) Video/Speech … Most companies are Struggling with Big Data GREAT RESULTS
  • 8. Hardest part of AI isn’t AI “Hidden Technical Debt in Machine Learning Systems", Google NIPS 2015 The hardest part of AI is Big Data ML Code
  • 10. Unified Analytics Platform UNIFIED EXPERIENCE ACROSS TEAMS UNIFIED PROCESSING ENGINE
  • 12. The Era of the Data Warehouse
  • 13. Data Warehouse (DW) THE GOOD • Pristine Data • Fast Queries • Transactional THE BAD • Expensive to Scale, not Elastic • Requires ETL, Stale Data, No Real-Time • No Predictions, No ML • Closed formats (lock in) Not Future Proof – Missing Predictions, Real-time, Scale ETL important data to central DW and get Business Intelligence (BI)
  • 14. The Era of the Data Lake
  • 15. THE BAD • Inconsistent Data • Unreliable for Analytics • Lack of Schema • Poor Performance Hadoop Data Lake Become a cheap messy data store with poor performance ETL all data to central scalable open lake for all use cases THE GOOD • Massive scale • Inexpensive Storage • Open Formats (Parquet, ORC) • Promise of ML & Real Time Streaming
  • 16. The Current State of Data Platforms
  • 17. Info Sec at a Fortune 100 Company DISADVANTAGES OF ARCHITECTURE • Poor agility in responding to new threats • Scale Limitations, no historical data • 6 Months and twenty people to build ENTERPRISE DATA WAREHOUSE • Only 2 weeks of data • Very expensive to scale • Proprietary Formats • No Predictions (ML) Messy data not ready for analytics Billions of records a day HADOOP DATA LAKE Complex ETL EDW EDW EDW Incidence Response Alerting Reports
  • 19. First UNIFIED data management system that delivers: The SCALE of data lake The LOW-LATENCY of streaming The RELIABILITY & PERFORMANCE of data warehouse Announcing Databricks Delta
  • 20. The SCALE of data lake The LOW-LATENCY of streaming The RELIABILITY & PERFORMANCE of data warehouse
  • 21. Databricks Delta Enables Predictions, Real-time and Ad Hoc Analytics at Massive Scale THE GOOD OF DATA LAKES • Massive scale on Amazon S3 • Open Formats (Parquet, ORC) • Predictions (ML) & Real Time Streaming THE GOOD OF DATA WAREHOUSES • Pristine Data • Transactional Reliability • Fast Queries (10-100x)
  • 22. Databricks Delta Under the Hood • Decouple Compute & Storage • ACID Transactions & Data Validation • Data Indexing & Caching (10-100x) • Real-Time Streaming Ingest MASSIVE SCALE RELIABILITY PERFORMANCE LOW-LATENCY
  • 23. Info Sec with Databricks Delta DATABRICKS RUNTIME powered by DATABRICKS RUNTIME Trillion Records a Day DATABRICKS DELTA ETL, Schema Validation SQL , ML, Stream ADVANTAGES • AI capable data warehouse at the scale of a data lake • Interactive analysis on 2 years of data • 2 Weeks to build with a 5 person data platform team
  • 24. Unified Analytics Platform UNIFIED EXPERIENCE ACROSS TEAMS Notebooks,Dashboards,Reports
  • 26. Demo by Michael Armbrust
  • 27. Evolution of a Cutting-Edge Data Pipeline Events ? Reporting Streaming Analytics Data Lake
  • 28. Evolution of a Cutting-Edge Data Pipeline Events Reporting Streaming Analytics Data Lake
  • 29. Challenge #1: Historical Queries? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events λ-arch1 1 1
  • 30. Challenge #2: Messy Data? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation 1 21 1 2
  • 31. Reprocessing Challenge #3: Mistakes and Failures? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation Reprocessing Partitioned 1 2 3 1 1 3 2
  • 32. Reprocessing Challenge #4: Query Performance? Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation Reprocessing Compaction Partitioned Compact Small Files Scheduled to Avoid Compaction 1 2 3 1 1 2 4 4 4 2
  • 33. Let’s try it instead with DELTA
  • 34. Reprocessing The Canonical Data Pipeline Data Lake λ-arch λ-arch Streaming Analytics Reporting Events Validation λ-arch Validation Reprocessing Compaction Partitioned Compact Small Files Scheduled to Avoid Compaction 1 2 3 1 1 2 4 4 4 2 Challenge
  • 35. DELTA DATA LAKE Reporting Streaming Analytics The LOW-LATENCY of streaming The RELIABILITY & PERFORMANCE of data warehouse The SCALE of data lake The Delta Architecture
  • 36. Sign up for the Private Beta visit databricks.