SlideShare a Scribd company logo
1 of 25
©2021 Databricks Inc. — All rights reserved
Modernize your Data
Warehouse
Amit Kara, Director, Technical Product Marketing
Soham Bhatt, SME Lead, DW Migration
A migration journey to the Databricks Lakehouse
Platform
©2021 Databricks Inc. — All rights reserved
Agenda
• Why lakehouse for data warehousing
• How does Databricks help with Data Warehousing
• Key differentiators when using the Databricks Lakehouse Platform
• Demo: Data warehousing on Databricks
• How to modernize your data warehouse to a Lakehouse
• Key takeaways for migrating to the Lakehouse
©2021 Databricks Inc. — All rights reserved
What’s the problem
we’re solving?
©2021 Databricks Inc. — All rights reserved
Legacy Data Warehouses aren’t keeping up
Data Warehouses can’t
keep up with data
volume and variety
Innovation hinges on
integrating ML/AI and
predictive insights
Business agility requires
reliable, real-time data
Not cost effective,
especially with scale
Data is vendor locked-in
and duplicated
©2021 Databricks Inc. — All rights reserved
The problem with legacy CDW: a fragmented
approach to modernizing your architecture
Structured
Cloud
Data
Warehouse
Unstructured
Semi-Structured
DATA LAKE
BI Reports, Dashboards & SQL ELT/ETL
ADLS AWS S3 GCP
Data Science Model Training
Model Scoring Model Deployment
Limited support
for streaming
Limited support for
unstructured data
(audio/images/video)
Complex & many
stages.
Data is duplicated
Lock-in / proprietary
format
Compute cost for
all data access
Disparate tooling decreases data team
productivity
©2021 Databricks Inc. — All rights reserved
Why Data Warehousing on
Databricks?
©2021 Databricks Inc. — All rights reserved
Your tools of choice
Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or
Databricks to ingest, transform and query all your data in-place.
Serverless compute
Lower costs and eliminate the need to manage, configure or scale
cloud infrastructure with serverless and get the best
price/performance.
Unified governance
Simplify architecture, establish one single copy for all your data, and
one unified governance layer across all data teams using standard SQL.
Why Data Warehousing
on Databricks
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
Break down silos
Empower data scientists and analysts to access the most complete
and freshest data faster, and uncover new insights together.
©2021 Databricks Inc. — All rights reserved
Connect your data, analytics and AI
tools to the Databricks Lakehouse
Discover validated data and AI
solutions for new use cases
Setup in a few clicks with pre-built
integrations
Integrated out-of-the-box with Partner Connect
Business
Intelligence
ML
Tools
Data
Preparation
Data
Connectors
Solution
Accelerators
Data
Apps
Partners
Discover, connect, and process data, analytics, and AI tools to your lakehouse
©2021 Databricks Inc. — All rights reserved
Databricks thrives within your modern data
stack
Unity Catalog
Delta Lake
All structured and unstructured data
Cloud Data Lake
Data
Warehousing
Data
Engineering
Data Science
and ML
Data
Streaming
BI and Dashboards Data Science
Data Pipelines
Data Governance
Machine Learning
10
Data Ingestion
©2021 Databricks Inc. — All rights reserved
First-class SQL development experience
Query data lake data using
familiar ANSI SQL, and
collaboratively find and share
new insights faster with the
built-in SQL query editor, alerts,
visualizations, and interactive
dashboards.
Collaboratively query, explore, and transform data in-place
©2021 Databricks Inc. — All rights reserved
Elastic, instant compute decoupled from storage
• Quickly setup optimized compute
resources with SQL endpoints
(powered by vectorized engine Photon)
• High concurrency built-in with
automatic load balancing
• Intelligent workload management and
faster reads from cloud storage
• Instant startup and greater availability
• Available in Databricks Serverless
(preview) !
No resource management needed with Serverless
©2021 Databricks Inc. — All rights reserved
Built from the ground up for best price/performance
Source: Performance Benchmark with Barcelona Supercomputing Center
Query and analyze your most complete and freshest data with
up to 12x better price/performance than traditional cloud data warehouses.
Lightning fast analytics
©2021 Databricks Inc. — All rights reserved 15
● Centralized metadata and user
management
● Centralized data access controls
● Data lineage Private Preview
● Data access auditing
● Data search and discovery Coming Soon
● Secure data sharing with Delta Sharing
● Standard SQL
Fine-grained governance on the Lakehouse
Unity Catalog
©2021 Databricks Inc. — All rights reserved
Key considerations for Modern Analytics & DW
❏ Empower Business Units for Self-service and Advanced Analytics
❏ Simple, Collaborative, Agile Cross-Functional teams
❏ Machine Learning and Artificial Intelligence - CIO level initiatives
❏ Platform that support for all data types - structured and
unstructured
❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
©2021 Databricks Inc. — All rights reserved
Demo
©2021 Databricks Inc. — All rights reserved
Modern Data Warehousing on Databricks
Data Science and
Machine Learning
Databricks Machine Learning
Batch Ingestion
Stream Ingestion
Curated Data
Raw
Ingestion
and History
BRONZE
Filtered,
Cleaned,
Augmented
SILVER
Business
Aggregates &
Data Models
GOLD
Enterprise
Reporting and BI
DBSQL
Endpoints
Databricks SQL
Databricks Notebooks, Delta Live Tables
Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform
ETL Partners
Data Governance powered by Databricks Unity Catalog
EDC
©2022 Databricks Inc. — All rights reserved
Building your
Lakehouse
Comprehensive investment
into your success
20
Supported by 24/7/365 global,
production operations at scale
Your
success
Solution
Accelerators
In-person and
Virtual Training
Co-located
Professional
Services
©2021 Databricks Inc. — All rights reserved
Migration Methodology
21
Phase 1
Discovery
Migration
specific
discovery and
consultation
Phase 2
Assessment
Assessment,
Design, Tooling,
Accelerators,
Sizing, Partners
Phase 3
Strategy
Technology
mapping,
migration
workshop,
migration
planning
Databricks Migration Team with/without Partner
Phase 4
Production Pilot
Reference
implementation
of a production
use case, Overall
migration
implementation
plan
Phase 5
Execution
Migration
execution and
support
Databricks PS Driven
Partner Driven
©2021 Databricks Inc. — All rights reserved
Migration Approach
22
Architecture/
Infrastructure
● Establish
deployment
Architecture
● Implement
Security and
Governance
framework
Data Migration
● Map Data
Structures and
Layout
● Complete One
time load
● Implement
incremental load
approach
ETL and Pipelines
● Migrate Data
transformation
and pipeline
code,
orchestration
and jobs
● Speedup your
migration using
Automation tools
● Validate:
Compare your
results with On
Prem data and
expected results
BI and Analytics
● Re-point reports
and analytics for
Business
Analysts and
Business
Outcomes
● Semantic
Layer/OLAP
cube repointing
● Connect to
reporting and
analytics
applications
Data Science/ML
● Establish
connectivity to
ML Tools
● Onboard Data
Science teams
©2021 Databricks Inc. — All rights reserved
Strategies for Data Migration
One-time loads, catch-up loads , Real-time vs Batch Ingestion
1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion)
1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest
1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
©2021 Databricks Inc. — All rights reserved
Strategies for ETL/Code Migration
Use of Automated tools or frameworks can reduce your timelines by over 50%!
Migration of Stored Procedures and/or ETL Mappings
• For Databricks Notebooks based ETL:
• Delta Live Tables or Databricks Notebook-based ETL
• Metadata-driven Ingestion Frameworks
• ETL tool Partners:
• Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more
• Auto code converters accelerate migrations!
©2022 Databricks Inc. — All rights reserved
Repoint Cubes and Reports to Databricks
• As easy as repointing your reports to DBSQL jdbc/odbc drivers
(Photon and our newest cloudfetch ODBC drivers )
• Key Integrations
• PowerBI Premium ( semantic layers, composite models, upto 400 GB caching)
• Tableau Hyper Extracts
• Looker
• OLAP cube partners like Microstrategy
• Atscale: Universal Semantic layer
( aggs built in Databricks)
Unleash Self-service Analytics with a Semantic Lakehouse
25
©2022 Databricks Inc. — All rights reserved
Key Takeaways..
Migration is a team sport
● Data Warehousing on Lakehouse is simple
● Migrations can be accelerated using automation tools
● Extensive Partner Ecosystem around Databricks Modern Data Stack
● Huge set of joint offerings to accelerate migrations with SI/Consulting
Partners
©2021 Databricks Inc. — All rights reserved
Next Steps
1. Learn more about the Inner Workings of the Lakehouse
1. Schedule a Data Warehouse migration workshop
1. Schedule a Databricks SQL Hands-on workshop
Customize your EDW/ETL Migration Success Plan with an Expert-led Migration
Assessment Workshop
©2021 Databricks Inc. — All rights reserved

More Related Content

What's hot

Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksKnoldus Inc.
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseData Con LA
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureDatabricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsKhalid Salama
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta LakeDatabricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPDatabricks
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)James Serra
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 

What's hot (20)

Getting Started with Delta Lake on Databricks
Getting Started with Delta Lake on DatabricksGetting Started with Delta Lake on Databricks
Getting Started with Delta Lake on Databricks
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Building the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake AnalyticsBuilding the Data Lake with Azure Data Factory and Data Lake Analytics
Building the Data Lake with Azure Data Factory and Data Lake Analytics
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Intro to Delta Lake
Intro to Delta LakeIntro to Delta Lake
Intro to Delta Lake
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
Building End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCPBuilding End-to-End Delta Pipelines on GCP
Building End-to-End Delta Pipelines on GCP
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Azure Synapse Analytics
Azure Synapse AnalyticsAzure Synapse Analytics
Azure Synapse Analytics
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 

Similar to DW Migration Webinar-March 2022.pptx

Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...HostedbyConfluent
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfIlham31574
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeTorsten Steinbach
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...HostedbyConfluent
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Denodo
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaMarketingArrowECS_CZ
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyLeonid Nekhymchuk
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Denodo
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?MarketingArrowECS_CZ
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartchCloudera, Inc.
 
The new big data
The new big dataThe new big data
The new big dataAdam Doyle
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022HostedbyConfluent
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 

Similar to DW Migration Webinar-March 2022.pptx (20)

Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
Standing on the Shoulders of Open-Source Giants: The Serverless Realtime Lake...
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
IBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lakeIBM Cloud Day January 2021 - A well architected data lake
IBM Cloud Day January 2021 - A well architected data lake
 
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ra...
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
Self Service Analytics and a Modern Data Architecture with Data Virtualizatio...
 
Oracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management PlatformaOracle databáze – Konsolidovaná Data Management Platforma
Oracle databáze – Konsolidovaná Data Management Platforma
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
VisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case studyVisiQuate: Azure cloud migration case study
VisiQuate: Azure cloud migration case study
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?Jak konsolidovat Vaše databáze s využitím Cloud služeb?
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
 
Big data journey to the cloud 5.30.18 asher bartch
Big data journey to the cloud 5.30.18   asher bartchBig data journey to the cloud 5.30.18   asher bartch
Big data journey to the cloud 5.30.18 asher bartch
 
The new big data
The new big dataThe new big data
The new big data
 
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
Streaming Data Into Your Lakehouse With Frank Munz | Current 2022
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Recently uploaded

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

DW Migration Webinar-March 2022.pptx

  • 1. ©2021 Databricks Inc. — All rights reserved Modernize your Data Warehouse Amit Kara, Director, Technical Product Marketing Soham Bhatt, SME Lead, DW Migration A migration journey to the Databricks Lakehouse Platform
  • 2. ©2021 Databricks Inc. — All rights reserved Agenda • Why lakehouse for data warehousing • How does Databricks help with Data Warehousing • Key differentiators when using the Databricks Lakehouse Platform • Demo: Data warehousing on Databricks • How to modernize your data warehouse to a Lakehouse • Key takeaways for migrating to the Lakehouse
  • 3. ©2021 Databricks Inc. — All rights reserved What’s the problem we’re solving?
  • 4. ©2021 Databricks Inc. — All rights reserved Legacy Data Warehouses aren’t keeping up Data Warehouses can’t keep up with data volume and variety Innovation hinges on integrating ML/AI and predictive insights Business agility requires reliable, real-time data Not cost effective, especially with scale Data is vendor locked-in and duplicated
  • 5. ©2021 Databricks Inc. — All rights reserved The problem with legacy CDW: a fragmented approach to modernizing your architecture Structured Cloud Data Warehouse Unstructured Semi-Structured DATA LAKE BI Reports, Dashboards & SQL ELT/ETL ADLS AWS S3 GCP Data Science Model Training Model Scoring Model Deployment Limited support for streaming Limited support for unstructured data (audio/images/video) Complex & many stages. Data is duplicated Lock-in / proprietary format Compute cost for all data access Disparate tooling decreases data team productivity
  • 6. ©2021 Databricks Inc. — All rights reserved Why Data Warehousing on Databricks?
  • 7. ©2021 Databricks Inc. — All rights reserved Your tools of choice Use your favorite tools like Fivetran, dbt, PowerBI , Tableau or Databricks to ingest, transform and query all your data in-place. Serverless compute Lower costs and eliminate the need to manage, configure or scale cloud infrastructure with serverless and get the best price/performance. Unified governance Simplify architecture, establish one single copy for all your data, and one unified governance layer across all data teams using standard SQL. Why Data Warehousing on Databricks Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming Break down silos Empower data scientists and analysts to access the most complete and freshest data faster, and uncover new insights together.
  • 8. ©2021 Databricks Inc. — All rights reserved Connect your data, analytics and AI tools to the Databricks Lakehouse Discover validated data and AI solutions for new use cases Setup in a few clicks with pre-built integrations Integrated out-of-the-box with Partner Connect Business Intelligence ML Tools Data Preparation Data Connectors Solution Accelerators Data Apps Partners Discover, connect, and process data, analytics, and AI tools to your lakehouse
  • 9. ©2021 Databricks Inc. — All rights reserved Databricks thrives within your modern data stack Unity Catalog Delta Lake All structured and unstructured data Cloud Data Lake Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Data Science Data Pipelines Data Governance Machine Learning 10 Data Ingestion
  • 10. ©2021 Databricks Inc. — All rights reserved First-class SQL development experience Query data lake data using familiar ANSI SQL, and collaboratively find and share new insights faster with the built-in SQL query editor, alerts, visualizations, and interactive dashboards. Collaboratively query, explore, and transform data in-place
  • 11. ©2021 Databricks Inc. — All rights reserved Elastic, instant compute decoupled from storage • Quickly setup optimized compute resources with SQL endpoints (powered by vectorized engine Photon) • High concurrency built-in with automatic load balancing • Intelligent workload management and faster reads from cloud storage • Instant startup and greater availability • Available in Databricks Serverless (preview) ! No resource management needed with Serverless
  • 12. ©2021 Databricks Inc. — All rights reserved Built from the ground up for best price/performance Source: Performance Benchmark with Barcelona Supercomputing Center Query and analyze your most complete and freshest data with up to 12x better price/performance than traditional cloud data warehouses. Lightning fast analytics
  • 13. ©2021 Databricks Inc. — All rights reserved 15 ● Centralized metadata and user management ● Centralized data access controls ● Data lineage Private Preview ● Data access auditing ● Data search and discovery Coming Soon ● Secure data sharing with Delta Sharing ● Standard SQL Fine-grained governance on the Lakehouse Unity Catalog
  • 14. ©2021 Databricks Inc. — All rights reserved Key considerations for Modern Analytics & DW ❏ Empower Business Units for Self-service and Advanced Analytics ❏ Simple, Collaborative, Agile Cross-Functional teams ❏ Machine Learning and Artificial Intelligence - CIO level initiatives ❏ Platform that support for all data types - structured and unstructured ❏ Cloud - choose Best of the Breed - Open Tech Stack vs Proprietary
  • 15. ©2021 Databricks Inc. — All rights reserved Demo
  • 16. ©2021 Databricks Inc. — All rights reserved Modern Data Warehousing on Databricks Data Science and Machine Learning Databricks Machine Learning Batch Ingestion Stream Ingestion Curated Data Raw Ingestion and History BRONZE Filtered, Cleaned, Augmented SILVER Business Aggregates & Data Models GOLD Enterprise Reporting and BI DBSQL Endpoints Databricks SQL Databricks Notebooks, Delta Live Tables Select the Ingestion, ETL, Presentation Layer and Governance Ecosystem on the Databricks Platform ETL Partners Data Governance powered by Databricks Unity Catalog EDC
  • 17. ©2022 Databricks Inc. — All rights reserved Building your Lakehouse Comprehensive investment into your success 20 Supported by 24/7/365 global, production operations at scale Your success Solution Accelerators In-person and Virtual Training Co-located Professional Services
  • 18. ©2021 Databricks Inc. — All rights reserved Migration Methodology 21 Phase 1 Discovery Migration specific discovery and consultation Phase 2 Assessment Assessment, Design, Tooling, Accelerators, Sizing, Partners Phase 3 Strategy Technology mapping, migration workshop, migration planning Databricks Migration Team with/without Partner Phase 4 Production Pilot Reference implementation of a production use case, Overall migration implementation plan Phase 5 Execution Migration execution and support Databricks PS Driven Partner Driven
  • 19. ©2021 Databricks Inc. — All rights reserved Migration Approach 22 Architecture/ Infrastructure ● Establish deployment Architecture ● Implement Security and Governance framework Data Migration ● Map Data Structures and Layout ● Complete One time load ● Implement incremental load approach ETL and Pipelines ● Migrate Data transformation and pipeline code, orchestration and jobs ● Speedup your migration using Automation tools ● Validate: Compare your results with On Prem data and expected results BI and Analytics ● Re-point reports and analytics for Business Analysts and Business Outcomes ● Semantic Layer/OLAP cube repointing ● Connect to reporting and analytics applications Data Science/ML ● Establish connectivity to ML Tools ● Onboard Data Science teams
  • 20. ©2021 Databricks Inc. — All rights reserved Strategies for Data Migration One-time loads, catch-up loads , Real-time vs Batch Ingestion 1. Extract from Databases via JDBC ODBC connectors via spark.read.jdbc.. (Parallel ingestion) 1. Extract to Cloud Storage and use Databricks Autoloader for streaming ingest 1. ISV Partners for Real-Time CDC Ingestion ( Arcion, Fivetran, Qlik, Rivery, Streamsets..)
  • 21. ©2021 Databricks Inc. — All rights reserved Strategies for ETL/Code Migration Use of Automated tools or frameworks can reduce your timelines by over 50%! Migration of Stored Procedures and/or ETL Mappings • For Databricks Notebooks based ETL: • Delta Live Tables or Databricks Notebook-based ETL • Metadata-driven Ingestion Frameworks • ETL tool Partners: • Matillion, Prophecy, DBT, Informatica, Talend, Infoworks.. many more • Auto code converters accelerate migrations!
  • 22. ©2022 Databricks Inc. — All rights reserved Repoint Cubes and Reports to Databricks • As easy as repointing your reports to DBSQL jdbc/odbc drivers (Photon and our newest cloudfetch ODBC drivers ) • Key Integrations • PowerBI Premium ( semantic layers, composite models, upto 400 GB caching) • Tableau Hyper Extracts • Looker • OLAP cube partners like Microstrategy • Atscale: Universal Semantic layer ( aggs built in Databricks) Unleash Self-service Analytics with a Semantic Lakehouse 25
  • 23. ©2022 Databricks Inc. — All rights reserved Key Takeaways.. Migration is a team sport ● Data Warehousing on Lakehouse is simple ● Migrations can be accelerated using automation tools ● Extensive Partner Ecosystem around Databricks Modern Data Stack ● Huge set of joint offerings to accelerate migrations with SI/Consulting Partners
  • 24. ©2021 Databricks Inc. — All rights reserved Next Steps 1. Learn more about the Inner Workings of the Lakehouse 1. Schedule a Data Warehouse migration workshop 1. Schedule a Databricks SQL Hands-on workshop Customize your EDW/ETL Migration Success Plan with an Expert-led Migration Assessment Workshop
  • 25. ©2021 Databricks Inc. — All rights reserved