SlideShare a Scribd company logo
©2021 Databricks Inc. — All rights reserved
Modernize your Data
Warehouse
Amit Kara, Director, Technical Product Marketing
Soham Bhatt, SME Lead, DW Migration
A migration journey to the Databricks Lakehouse
Platform
©2021 Databricks Inc. — All rights reserved
Agenda
• Why lakehouse for data warehousing
• How does Databricks help with Data Warehousing
• Key differentiators when using the Databricks Lakehouse Platform
• Demo: Data warehousing on Databricks
• How to modernize your data warehouse to a Lakehouse
• Key takeaways for migrating to the Lakehouse
©2021 Databricks Inc. — All rights reserved
What’s the problem
we’re solving?
©2021 Databricks Inc. — All rights reserved
Legacy Data Warehouses aren’t keeping up
Data Warehouses can’t
keep up with data
volume and variety
Innovation hinges on
integrating ML/AI and
predictive insights
Business agility requires
reliable, real-time data
Not cost effective,
especially with scale
Data is vendor locked-in
and duplicated
©2021 Databricks Inc. — All rights reserved
The problem with legacy CDW: a fragmented
approach to modernizing your architecture
Structured
Cloud
Data
Warehouse
Unstructured
Semi-Structured
DATA LAKE
BI Reports, Dashboards & SQL ELT/ETL
ADLS AWS S3 GCP
Data Science Model Training
Model Scoring Model Deployment
Limited support
for streaming
Limited support for
unstructured data
(audio/images/video)
Complex & many
stages.
Data is duplicated
Lock-in / proprietary
format
Compute cost for
all data access
Disparate tooling decreases data team
productivity
©2021 Databricks Inc. — All rights reserved
Why Data Warehousing on
Databricks?
©2021 Databricks Inc. — All rights reserved
Your tools of choice
Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or
Databricks to ingest, transform and query all your data in-place.
Serverless compute
Lower costs and eliminate the need to manage, configure or scale
cloud infrastructure with serverless and get the best
price/performance.
Unified governance
Simplify architecture, establish one single copy for all your data, and
one unified governance layer across all data teams using standard SQL.
Why Data Warehousing
on Databricks
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
Break down silos
Empower data scientists and analysts to access the most complete
and freshest data faster, and uncover new insights together.
©2021 Databricks Inc. — All rights reserved
Connect your data, analytics and AI
tools to the Databricks Lakehouse
Discover validated data and AI
solutions for new use cases
Setup in a few clicks with pre-built
integrations
Integrated out-of-the-box with Partner Connect
Business
Intelligence
ML
Tools
Data
Preparation
Data
Connectors
Solution
Accelerators
Data
Apps
Partners
Discover, connect, and process data, analytics, and AI tools to your lakehouse
©2021 Databricks Inc. — All rights reserved
Databricks thrives within your modern data
stack
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
BI and Dashboards Data Science
Data Pipelines
Data Governance
Machine Learning
10
Data Ingestion
©2021 Databricks Inc. — All rights reserved
First-class SQL development experience
Query data lake data using
familiar ANSI SQL, and
collaboratively find and share
new insights faster with the
built-in SQL query editor, alerts,
visualizations, and interactive
dashboards.
Collaboratively query, explore, and transform data in-place
©2021 Databricks Inc. — All rights reserved
Elastic, instant compute decoupled from storage
• Quickly setup optimized compute
resources with SQL endpoints
(powered by vectorized engine Photon)
• High concurrency built-in with
automatic load balancing
• Intelligent workload management and
faster reads from cloud storage
• Instant startup and greater availability
• Available in Databricks Serverless
(preview) !
No resource management needed with Serverless
©2021 Databricks Inc. — All rights reserved
Built from the ground up for best price/performance
Source: Performance Benchmark with Barcelona Supercomputing Center
Query and analyze your most complete and freshest data with
up to 12x better price/performance than traditional cloud data warehouses.
Lightning fast analytics
©2021 Databricks Inc. — All rights reserved 15
● Centralized metadata and user
management
● Centralized data access controls
● Data lineage Private Preview
● Data access auditing
● Data search and discovery Coming Soon
● Secure data sharing with Delta Sharing
● Standard SQL
Fine-grained governance on the Lakehouse
Unity Catalog
©2021 Databricks Inc. — All rights reserved
Key considerations for Modern Analytics & DW
❏ Empower Business Units for Self-service and Advanced Analytics
❏ Simple, Collaborative, Agile Cross-Functional teams
❏ Machine Learning and Artificial Intelligence - CIO level initiatives
❏ Platform that support for all data types - structured and
unstructured
❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
©2021 Databricks Inc. — All rights reserved
Demo
©2021 Databricks Inc. — All rights reserved
Modern Data Warehousing on Databricks
Data Science and
Machine Learning
Databricks Machine Learning
Batch Ingestion
Stream Ingestion
Curated Data
Raw
Ingestion
and History
BRONZE
Filtered,
Cleaned,
Augmented
SILVER
Business
Aggregates &
Data Models
GOLD
Enterprise
Reporting and BI
DBSQL
Endpoints
Databricks SQL
Databricks Notebooks, Delta Live Tables
Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform
ETL Partners
Data Governance powered by Databricks Unity Catalog
EDC
©2022 Databricks Inc. — All rights reserved
Building your
Lakehouse
Comprehensive investment
into your success
20
Supported by 24/7/365 global,
production operations at scale
Your
success
Solution
Accelerators
In-person and
Virtual Training
Co-located
Professional
Services
©2021 Databricks Inc. — All rights reserved
Migration Methodology
21
Phase 1
Discovery
Migration
specific
discovery and
consultation
Phase 2
Assessment
Assessment,
Design, Tooling,
Accelerators,
Sizing, Partners
Phase 3
Strategy
Technology
mapping,
migration
workshop,
migration
planning
Databricks Migration Team with/without Partner
Phase 4
Production Pilot
Reference
implementation
of a production
use case, Overall
migration
implementation
plan
Phase 5
Execution
Migration
execution and
support
Databricks PS Driven
Partner Driven
©2021 Databricks Inc. — All rights reserved
Migration Approach
22
Architecture/
Infrastructure
● Establish
deployment
Architecture
● Implement
Security and
Governance
framework
Data Migration
● Map Data
Structures and
Layout
● Complete One
time load
● Implement
incremental load
approach
ETL and Pipelines
● Migrate Data
transformation
and pipeline
code,
orchestration
and jobs
● Speedup your
migration using
Automation tools
● Validate:
Compare your
results with On
Prem data and
expected results
BI and Analytics
● Re-point reports
and analytics for
Business
Analysts and
Business
Outcomes
● Semantic
Layer/OLAP
cube repointing
● Connect to
reporting and
analytics
applications
Data Science/ML
● Establish
connectivity to
ML Tools
● Onboard Data
Science teams
©2021 Databricks Inc. — All rights reserved
Strategies for Data Migration
One-time loads, catch-up loads , Real-time vs Batch Ingestion
1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion)
1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest
1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
©2021 Databricks Inc. — All rights reserved
Strategies for ETL/Code Migration
Use of Automated tools or frameworks can reduce your timelines by over 50%!
Migration of Stored Procedures and/or ETL Mappings
• For Databricks Notebooks based ETL:
• Delta Live Tables or Databricks Notebook-based ETL
• Metadata-driven Ingestion Frameworks
• ETL tool Partners:
• Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more
• Auto code converters accelerate migrations!
©2022 Databricks Inc. — All rights reserved
Repoint Cubes and Reports to Databricks
• As easy as repointing your reports to DBSQL jdbc/odbc drivers
(Photon and our newest cloudfetch ODBC drivers )
• Key Integrations
• PowerBI Premium ( semantic layers, composite models, upto 400 GB caching)
• Tableau Hyper Extracts
• Looker
• OLAP cube partners like Microstrategy
• Atscale: Universal Semantic layer
( aggs built in Databricks)
Unleash Self-service Analytics with a Semantic Lakehouse
25
©2022 Databricks Inc. — All rights reserved
Key Takeaways..
Migration is a team sport
● Data Warehousing on Lakehouse is simple
● Migrations can be accelerated using automation tools
● Extensive Partner Ecosystem around Databricks Modern Data Stack
● Huge set of joint offerings to accelerate migrations with SI/Consulting
Partners
©2021 Databricks Inc. — All rights reserved
Next Steps
1. Learn more about the Inner Workings of the Lakehouse
1. Schedule a Data Warehouse migration workshop
1. Schedule a Databricks SQL Hands-on workshop
Customize your EDW/ETL Migration Success Plan with an Expert-led Migration
Assessment Workshop
©2021 Databricks Inc. — All rights reserved

More Related Content

What's hot

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
James Serra
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
DATAVERSITY
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
Kshitija(KJ) Gupte
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
Kujambu Murugesan
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
Dalibor Wijas
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
DATAVERSITY
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
James Serra
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
Databricks
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
Databricks
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
BigID Inc
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
Kent Graziano
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
DATAVERSITY
 

What's hot (20)

Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Improving Data Literacy Around Data Architecture
Improving Data Literacy Around Data ArchitectureImproving Data Literacy Around Data Architecture
Improving Data Literacy Around Data Architecture
 
Migration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptxMigration to Databricks - On-prem HDFS.pptx
Migration to Databricks - On-prem HDFS.pptx
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
Collibra Data Citizen '19 - Bridging Data Privacy with Data Governance
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Data Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data QualityData Modeling, Data Governance, & Data Quality
Data Modeling, Data Governance, & Data Quality
 

Similar to DW Migration Webinar-March 2022.pptx

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
CalvinSim10
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
HostedbyConfluent
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
Ilham31574
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
DATAVERSITY
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
Torsten Steinbach
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
HostedbyConfluent
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
DATAVERSITY
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Denodo
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
MarketingArrowECS_CZ
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
DATAVERSITY
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
Kent Graziano
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
DATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
Leonid Nekhymchuk
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
MarketingArrowECS_CZ
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
The new big data
The new big dataThe new big data
The new big data
Adam Doyle
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
HostedbyConfluent
 

Similar to DW Migration Webinar-March 2022.pptx (20)

Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
The new big data
The new big dataThe new big data
The new big data
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 

DW Migration Webinar-March 2022.pptx

  • 1. ©2021 Databricks Inc. — All rights reserved Modernize your Data Warehouse Amit Kara, Director, Technical Product Marketing Soham Bhatt, SME Lead, DW Migration A migration journey to the Databricks Lakehouse Platform
  • 2. ©2021 Databricks Inc. — All rights reserved Agenda • Why lakehouse for data warehousing • How does Databricks help with Data Warehousing • Key differentiators when using the Databricks Lakehouse Platform • Demo: Data warehousing on Databricks • How to modernize your data warehouse to a Lakehouse • Key takeaways for migrating to the Lakehouse
  • 3. ©2021 Databricks Inc. — All rights reserved What’s the problem we’re solving?
  • 4. ©2021 Databricks Inc. — All rights reserved Legacy Data Warehouses aren’t keeping up Data Warehouses can’t keep up with data volume and variety Innovation hinges on integrating ML/AI and predictive insights Business agility requires reliable, real-time data Not cost effective, especially with scale Data is vendor locked-in and duplicated
  • 5. ©2021 Databricks Inc. — All rights reserved The problem with legacy CDW: a fragmented approach to modernizing your architecture Structured Cloud Data Warehouse Unstructured Semi-Structured DATA LAKE BI Reports, Dashboards & SQL ELT/ETL ADLS AWS S3 GCP Data Science Model Training Model Scoring Model Deployment Limited support for streaming Limited support for unstructured data (audio/images/video) Complex & many stages. Data is duplicated Lock-in / proprietary format Compute cost for all data access Disparate tooling decreases data team productivity
  • 6. ©2021 Databricks Inc. — All rights reserved Why Data Warehousing on Databricks?
  • 7. ©2021 Databricks Inc. — All rights reserved Your tools of choice Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or Databricks to ingest, transform and query all your data in-place. Serverless compute Lower costs and eliminate the need to manage, configure or scale cloud infrastructure with serverless and get the best price/performance. Unified governance Simplify architecture, establish one single copy for all your data, and one unified governance layer across all data teams using standard SQL. Why Data Warehousing on Databricks Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming Break down silos Empower data scientists and analysts to access the most complete and freshest data faster, and uncover new insights together.
  • 8. ©2021 Databricks Inc. — All rights reserved Connect your data, analytics and AI tools to the Databricks Lakehouse Discover validated data and AI solutions for new use cases Setup in a few clicks with pre-built integrations Integrated out-of-the-box with Partner Connect Business Intelligence ML Tools Data Preparation Data Connectors Solution Accelerators Data Apps Partners Discover, connect, and process data, analytics, and AI tools to your lakehouse
  • 9. ©2021 Databricks Inc. — All rights reserved Databricks thrives within your modern data stack Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Data Science Data Pipelines Data Governance Machine Learning 10 Data Ingestion
  • 10. ©2021 Databricks Inc. — All rights reserved First-class SQL development experience Query data lake data using familiar ANSI SQL, and collaboratively find and share new insights faster with the built-in SQL query editor, alerts, visualizations, and interactive dashboards. Collaboratively query, explore, and transform data in-place
  • 11. ©2021 Databricks Inc. — All rights reserved Elastic, instant compute decoupled from storage • Quickly setup optimized compute resources with SQL endpoints (powered by vectorized engine Photon) • High concurrency built-in with automatic load balancing • Intelligent workload management and faster reads from cloud storage • Instant startup and greater availability • Available in Databricks Serverless (preview) ! No resource management needed with Serverless
  • 12. ©2021 Databricks Inc. — All rights reserved Built from the ground up for best price/performance Source: Performance Benchmark with Barcelona Supercomputing Center Query and analyze your most complete and freshest data with up to 12x better price/performance than traditional cloud data warehouses. Lightning fast analytics
  • 13. ©2021 Databricks Inc. — All rights reserved 15 ● Centralized metadata and user management ● Centralized data access controls ● Data lineage Private Preview ● Data access auditing ● Data search and discovery Coming Soon ● Secure data sharing with Delta Sharing ● Standard SQL Fine-grained governance on the Lakehouse Unity Catalog
  • 14. ©2021 Databricks Inc. — All rights reserved Key considerations for Modern Analytics & DW ❏ Empower Business Units for Self-service and Advanced Analytics ❏ Simple, Collaborative, Agile Cross-Functional teams ❏ Machine Learning and Artificial Intelligence - CIO level initiatives ❏ Platform that support for all data types - structured and unstructured ❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
  • 15. ©2021 Databricks Inc. — All rights reserved Demo
  • 16. ©2021 Databricks Inc. — All rights reserved Modern Data Warehousing on Databricks Data Science and Machine Learning Databricks Machine Learning Batch Ingestion Stream Ingestion Curated Data Raw Ingestion and History BRONZE Filtered, Cleaned, Augmented SILVER Business Aggregates & Data Models GOLD Enterprise Reporting and BI DBSQL Endpoints Databricks SQL Databricks Notebooks, Delta Live Tables Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform ETL Partners Data Governance powered by Databricks Unity Catalog EDC
  • 17. ©2022 Databricks Inc. — All rights reserved Building your Lakehouse Comprehensive investment into your success 20 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services
  • 18. ©2021 Databricks Inc. — All rights reserved Migration Methodology 21 Phase 1 Discovery Migration specific discovery and consultation Phase 2 Assessment Assessment, Design, Tooling, Accelerators, Sizing, Partners Phase 3 Strategy Technology mapping, migration workshop, migration planning Databricks Migration Team with/without Partner Phase 4 Production Pilot Reference implementation of a production use case, Overall migration implementation plan Phase 5 Execution Migration execution and support Databricks PS Driven Partner Driven
  • 19. ©2021 Databricks Inc. — All rights reserved Migration Approach 22 Architecture/ Infrastructure ● Establish deployment Architecture ● Implement Security and Governance framework Data Migration ● Map Data Structures and Layout ● Complete One time load ● Implement incremental load approach ETL and Pipelines ● Migrate Data transformation and pipeline code, orchestration and jobs ● Speedup your migration using Automation tools ● Validate: Compare your results with On Prem data and expected results BI and Analytics ● Re-point reports and analytics for Business Analysts and Business Outcomes ● Semantic Layer/OLAP cube repointing ● Connect to reporting and analytics applications Data Science/ML ● Establish connectivity to ML Tools ● Onboard Data Science teams
  • 20. ©2021 Databricks Inc. — All rights reserved Strategies for Data Migration One-time loads, catch-up loads , Real-time vs Batch Ingestion 1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion) 1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest 1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
  • 21. ©2021 Databricks Inc. — All rights reserved Strategies for ETL/Code Migration Use of Automated tools or frameworks can reduce your timelines by over 50%! Migration of Stored Procedures and/or ETL Mappings • For Databricks Notebooks based ETL: • Delta Live Tables or Databricks Notebook-based ETL • Metadata-driven Ingestion Frameworks • ETL tool Partners: • Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more • Auto code converters accelerate migrations!
  • 22. ©2022 Databricks Inc. — All rights reserved Repoint Cubes and Reports to Databricks • As easy as repointing your reports to DBSQL jdbc/odbc drivers (Photon and our newest cloudfetch ODBC drivers ) • Key Integrations • PowerBI Premium ( semantic layers, composite models, upto 400 GB caching) • Tableau Hyper Extracts • Looker • OLAP cube partners like Microstrategy • Atscale: Universal Semantic layer ( aggs built in Databricks) Unleash Self-service Analytics with a Semantic Lakehouse 25
  • 23. ©2022 Databricks Inc. — All rights reserved Key Takeaways.. Migration is a team sport ● Data Warehousing on Lakehouse is simple ● Migrations can be accelerated using automation tools ● Extensive Partner Ecosystem around Databricks Modern Data Stack ● Huge set of joint offerings to accelerate migrations with SI/Consulting Partners
  • 24. ©2021 Databricks Inc. — All rights reserved Next Steps 1. Learn more about the Inner Workings of the Lakehouse 1. Schedule a Data Warehouse migration workshop 1. Schedule a Databricks SQL Hands-on workshop Customize your EDW/ETL Migration Success Plan with an Expert-led Migration Assessment Workshop
  • 25. ©2021 Databricks Inc. — All rights reserved