SlideShare a Scribd company logo
1 of 22
Download to read offline
Modernizing to a
cloud data
architecture
Guido Oswald, Solutions Architect, Databricks
Matt Graves, VP of Enterprise Data & Analytics,
GCI Communication Corp
Agenda
• Top reasons to modernize from Hadoop to Databricks
• Success stories, technical and business benefits
• Fast migrations with low costs & low risk
• Fireside Chat: Matt Graves
Digital transformation is
accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
Digital transformation is
accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Digital transformation is accelerating
E-Commerce
Wearables, medical IoT
Streaming
Mobile payments, food
service, grocery deliveries…
The data surge is placing
tremendous pressure on
traditional data and analytics
infrastructure
Source: Gartner cited by Battery Ventures - Open Cloud report
Cloud adoption is
accelerating by $100B
from 2021 - 2023
Today, most enterprises struggle with data
Siloed stacks increase data architecture complexity
Data Warehousing Data Engineering Streaming
Data Science & Machine
Learning
Extract
Transform
Streaming data sources
Streaming Data Engine
Analytics and BI
Data marts
Data warehouse
Structured data
Structured, semi-structured
and unstructured data
Structured, semi-structured
and unstructured data
Data Lake
Data prep
Data Lake
Machine
Learning
Data
Science
Amazon Redshift Teradata
Azure Synapse Google BigQuery
Snowflake IBM Db2
SAP Oracle Autonomous
Data Warehouse
Hadoop Apache Airflow
Amazon EMR Apache Spark
Google Dataproc Cloudera
Jupyter Amazon SageMaker
Azure ML Studio MatLAB
Domino Data Labs SAS
TensorFlow PyTorch
Apache Kafka Apache Spark
Apache Flink Amazon Kinesis
Azure Stream Analytics
Tibco Spotfire
Google Dataflow
Confluent
Disconnected systems and proprietary data formats make integration difficult
Data
Scientists
Data
Engineers
Data
Analysts
Data
Engineers
Siloed data teams decrease productivity
Load Real-time Database
Is your architecture enabling growth?
Legacy on-premise data and analytics architectures are not keeping up
Hadoop costs rising when
costs need to be cut
Innovation hinges on ML
and predictive insights
Business agility requires
real-time data
Hadoop is costly, complex and ineffective
Hadoop ecosystem is complex,
hard to manage, and prone to
failures
24/7 HDFS clusters that need
to built for peak usage and are
costly to upgrade
• RIGID AND INELASTIC
• DEVOPS INTENSIVE
No out-of-box support for
ML/AI and separate data and AI
environments
• LACKS AI CAPABILITIES
Low Productivity Cost Prohibitive Slow Innovation
X
Enterprises need a modern data and analytics
architecture
CRITICAL REQUIREMENTS
Cost-effective scale and performance in the cloud
Easy to manage and highly reliable for diverse data
Predictive and real-time insights to drive
innovation
Modernization delivers business value
Forrester TEI study finds 417% ROI for
companies switching to Databricks
47%
Cost-savings from retiring
legacy infrastructure
5%
Increase in revenue
25%
Data team productivity
increase
Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
The Databricks Lakehouse Platform is one simple platform to unify all
your data, analytics, and AI workloads
Original creators of popular data and machine learning open-source projects
Global company with over 5,000 customers and more than 450 partners
Data
Warehouse
Lakehouse
One platform to unify all of
your data, analytics, and AI workloads
Data
Lake
Structured Semi-structured Unstructured Streaming
Lakehouse Platform
Data Engineering
BI & SQL
Analytics
Real-time Data
Applications
Data Science
& Machine
Learning
Data Management & Governance
Open Data Lake
SIMPLE OPEN COLLABORATIVE
From BI to AI
All your data,
analytics and
AI on one
Lakehouse
platform
Data Eng, ML
(Spark)
Scalable apps on Columnar store
(Hbase)
ETL, SQL
(Hive/ Impala)
Databricks jobs / Delta Lake / SparkSQL
(Highly tuned Spark engine: faster, less compute, one-stop-shop)
Batch Process
(MapReduce)
Real-time Event Processing
(Storm/ Spark)
Databricks Spark jobs
(orders of magnitude faster - but may need manual work)
Databricks Structured Streaming
(Spark Structured Streaming + Delta Lake: Streaming + Batch ingest)
Databricks jobs/ Delta Lake
(Highly tuned Spark engine: faster, less compute, one-stop-
shop)
Databricks Spark integrates w/ HBase on cloud
(Alternatively: use cloud data stores well integrated with Databricks)
Technology mapping: deliver better outcomes
Automation for most workload types
Data Migration
Metastore Migration
SQL Migration
Security
Scheduled Data pulls
Orchestration
HDFS
Hive Databases / Tables / Views
Impala Databases / Tables/ Views
HDFS
Hive Queries
Spark Queries
Sentry permissions /Ranger policies
HDFS access permissions
Sqoop statements
Oozie Jobs
Azure ADLS Gen 2, AWS S3, GCS
Databricks Tables
Databricks Tables
Spark Sql Databricks Notebooks
Spark Sql Databricks Notebooks
Databricks Notebooks
Databricks permissions
AWS IAM, ADLS ACLs
Databricks compatible PySpark code
Airflow DAGs & Databricks Jobs
55-66 % reduction in costs and 2-3x reduction in
timelines by using automation tools
Data Migration
Assessment & Design
Manual
Migration
Workloads Migration, Validation Cutover Operations
17- 20 Weeks
8 Weeks
Using
Automation
Accelerated Data & Workloads Migration,
Validation
Accelerated
Assessment &
Design
Cutover
Operations
* Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered
Same tool
used for pre-
migration
Assessment
Our partner ecosystem accelerates migrations
ISV Partners and Migration Tools
Security
Governance
Consulting & SI Partners
Databricks
Migration
SWAT team +
CS Packaged
Services
For Migration
Cloud
Modernization with Databricks - recap
Why - costs, productivity, innovation → business
impact
Your competitors and market leaders are doing it
NOW
Databricks experts and automation strategy can
help you migrate faster, with much lower cost and
risk
Visit databricks.com/migration to learn more
Fireside chat
Matt Graves
VP of Enterprise Data & Analytics
GCI Communication Corp
Backup
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

More Related Content

What's hot

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshJeffrey T. Pollock
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMatei Zaharia
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Databricks
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 

What's hot (20)

Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Making Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse TechnologyMaking Data Timelier and More Reliable with Lakehouse Technology
Making Data Timelier and More Reliable with Lakehouse Technology
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Snowflake Datawarehouse Architecturing
Snowflake Datawarehouse ArchitecturingSnowflake Datawarehouse Architecturing
Snowflake Datawarehouse Architecturing
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 

Similar to Modernizing to a Cloud Data Architecture

SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Abhimanyu Singhal
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightDataWorks Summit/Hadoop Summit
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsStreamsets Inc.
 
Streaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesStreaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesPrecisely
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligencePrecisely
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL ToolLyftron Data
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Denodo
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 

Similar to Modernizing to a Cloud Data Architecture (20)

SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure Opportunity: Data, Analytic & Azure
Opportunity: Data, Analytic & Azure
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Build Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsightBuild Big Data Enterprise Solutions Faster on Azure HDInsight
Build Big Data Enterprise Solutions Faster on Azure HDInsight
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Talend introduction v1
Talend introduction v1Talend introduction v1
Talend introduction v1
 
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
 
Streaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use CasesStreaming IBM i to Kafka for Next-Gen Use Cases
Streaming IBM i to Kafka for Next-Gen Use Cases
 
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Unlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location IntelligenceUnlock Data-driven Insights in Databricks Using Location Intelligence
Unlock Data-driven Insights in Databricks Using Location Intelligence
 
Best Bigquery ETL Tool
Best Bigquery ETL ToolBest Bigquery ETL Tool
Best Bigquery ETL Tool
 
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
Accelerate Self-Service Analytics with Virtualization and Visualisation (Thai)
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 

More from Databricks

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityDatabricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueDatabricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 

More from Databricks (20)

Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 

Recently uploaded

Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxStephen266013
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7gragkhusi
 
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一hwhqz6r1y
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理cyebo
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfEmmanuel Dauda
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理cyebo
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeralNABLAS株式会社
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Jon Hansen
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理pyhepag
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一0uyfyq0q4
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一w7jl3eyno
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdfvyankatesh1
 

Recently uploaded (20)

Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp  Number 24/7
ℂall Girls Balbir Nagar ℂall Now Chhaya ☎ 9899900591 WhatsApp Number 24/7
 
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
如何办理新加坡国立大学毕业证(NUS毕业证)学位证成绩单原版一比一
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdfGenerative AI for Trailblazers_ Unlock the Future of AI.pdf
Generative AI for Trailblazers_ Unlock the Future of AI.pdf
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
社内勉強会資料  Mamba - A new era or ephemeral
社内勉強会資料   Mamba - A new era or ephemeral社内勉強会資料   Mamba - A new era or ephemeral
社内勉強会資料  Mamba - A new era or ephemeral
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
一比一原版(Monash毕业证书)莫纳什大学毕业证成绩单如何办理
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
如何办理滑铁卢大学毕业证(Waterloo毕业证)成绩单本科学位证原版一比一
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
如何办理澳洲悉尼大学毕业证(USYD毕业证书)学位证书成绩单原版一比一
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 

Modernizing to a Cloud Data Architecture

  • 1. Modernizing to a cloud data architecture Guido Oswald, Solutions Architect, Databricks Matt Graves, VP of Enterprise Data & Analytics, GCI Communication Corp
  • 2. Agenda • Top reasons to modernize from Hadoop to Databricks • Success stories, technical and business benefits • Fast migrations with low costs & low risk • Fireside Chat: Matt Graves
  • 3. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries…
  • 4. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries… The data surge is placing tremendous pressure on traditional data and analytics infrastructure
  • 5. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries… The data surge is placing tremendous pressure on traditional data and analytics infrastructure Source: Gartner cited by Battery Ventures - Open Cloud report Cloud adoption is accelerating by $100B from 2021 - 2023
  • 6. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Transform Streaming data sources Streaming Data Engine Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science Amazon Redshift Teradata Azure Synapse Google BigQuery Snowflake IBM Db2 SAP Oracle Autonomous Data Warehouse Hadoop Apache Airflow Amazon EMR Apache Spark Google Dataproc Cloudera Jupyter Amazon SageMaker Azure ML Studio MatLAB Domino Data Labs SAS TensorFlow PyTorch Apache Kafka Apache Spark Apache Flink Amazon Kinesis Azure Stream Analytics Tibco Spotfire Google Dataflow Confluent Disconnected systems and proprietary data formats make integration difficult Data Scientists Data Engineers Data Analysts Data Engineers Siloed data teams decrease productivity Load Real-time Database
  • 7. Is your architecture enabling growth? Legacy on-premise data and analytics architectures are not keeping up Hadoop costs rising when costs need to be cut Innovation hinges on ML and predictive insights Business agility requires real-time data
  • 8. Hadoop is costly, complex and ineffective Hadoop ecosystem is complex, hard to manage, and prone to failures 24/7 HDFS clusters that need to built for peak usage and are costly to upgrade • RIGID AND INELASTIC • DEVOPS INTENSIVE No out-of-box support for ML/AI and separate data and AI environments • LACKS AI CAPABILITIES Low Productivity Cost Prohibitive Slow Innovation X
  • 9. Enterprises need a modern data and analytics architecture CRITICAL REQUIREMENTS Cost-effective scale and performance in the cloud Easy to manage and highly reliable for diverse data Predictive and real-time insights to drive innovation
  • 10. Modernization delivers business value Forrester TEI study finds 417% ROI for companies switching to Databricks 47% Cost-savings from retiring legacy infrastructure 5% Increase in revenue 25% Data team productivity increase Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
  • 11. The Databricks Lakehouse Platform is one simple platform to unify all your data, analytics, and AI workloads Original creators of popular data and machine learning open-source projects Global company with over 5,000 customers and more than 450 partners
  • 12. Data Warehouse Lakehouse One platform to unify all of your data, analytics, and AI workloads Data Lake
  • 13. Structured Semi-structured Unstructured Streaming Lakehouse Platform Data Engineering BI & SQL Analytics Real-time Data Applications Data Science & Machine Learning Data Management & Governance Open Data Lake SIMPLE OPEN COLLABORATIVE From BI to AI All your data, analytics and AI on one Lakehouse platform
  • 14. Data Eng, ML (Spark) Scalable apps on Columnar store (Hbase) ETL, SQL (Hive/ Impala) Databricks jobs / Delta Lake / SparkSQL (Highly tuned Spark engine: faster, less compute, one-stop-shop) Batch Process (MapReduce) Real-time Event Processing (Storm/ Spark) Databricks Spark jobs (orders of magnitude faster - but may need manual work) Databricks Structured Streaming (Spark Structured Streaming + Delta Lake: Streaming + Batch ingest) Databricks jobs/ Delta Lake (Highly tuned Spark engine: faster, less compute, one-stop- shop) Databricks Spark integrates w/ HBase on cloud (Alternatively: use cloud data stores well integrated with Databricks) Technology mapping: deliver better outcomes
  • 15. Automation for most workload types Data Migration Metastore Migration SQL Migration Security Scheduled Data pulls Orchestration HDFS Hive Databases / Tables / Views Impala Databases / Tables/ Views HDFS Hive Queries Spark Queries Sentry permissions /Ranger policies HDFS access permissions Sqoop statements Oozie Jobs Azure ADLS Gen 2, AWS S3, GCS Databricks Tables Databricks Tables Spark Sql Databricks Notebooks Spark Sql Databricks Notebooks Databricks Notebooks Databricks permissions AWS IAM, ADLS ACLs Databricks compatible PySpark code Airflow DAGs & Databricks Jobs
  • 16. 55-66 % reduction in costs and 2-3x reduction in timelines by using automation tools Data Migration Assessment & Design Manual Migration Workloads Migration, Validation Cutover Operations 17- 20 Weeks 8 Weeks Using Automation Accelerated Data & Workloads Migration, Validation Accelerated Assessment & Design Cutover Operations * Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered Same tool used for pre- migration Assessment
  • 17. Our partner ecosystem accelerates migrations ISV Partners and Migration Tools Security Governance Consulting & SI Partners Databricks Migration SWAT team + CS Packaged Services For Migration Cloud
  • 18. Modernization with Databricks - recap Why - costs, productivity, innovation → business impact Your competitors and market leaders are doing it NOW Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk
  • 20. Fireside chat Matt Graves VP of Enterprise Data & Analytics GCI Communication Corp
  • 22. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.