Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Modernizing to a Cloud Data Architecture

Download to read offline

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

  • Be the first to like this

Modernizing to a Cloud Data Architecture

  1. 1. Modernizing to a cloud data architecture Guido Oswald, Solutions Architect, Databricks Matt Graves, VP of Enterprise Data & Analytics, GCI Communication Corp
  2. 2. Agenda • Top reasons to modernize from Hadoop to Databricks • Success stories, technical and business benefits • Fast migrations with low costs & low risk • Fireside Chat: Matt Graves
  3. 3. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries…
  4. 4. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries… The data surge is placing tremendous pressure on traditional data and analytics infrastructure
  5. 5. Digital transformation is accelerating E-Commerce Wearables, medical IoT Streaming Mobile payments, food service, grocery deliveries… The data surge is placing tremendous pressure on traditional data and analytics infrastructure Source: Gartner cited by Battery Ventures - Open Cloud report Cloud adoption is accelerating by $100B from 2021 - 2023
  6. 6. Today, most enterprises struggle with data Siloed stacks increase data architecture complexity Data Warehousing Data Engineering Streaming Data Science & Machine Learning Extract Transform Streaming data sources Streaming Data Engine Analytics and BI Data marts Data warehouse Structured data Structured, semi-structured and unstructured data Structured, semi-structured and unstructured data Data Lake Data prep Data Lake Machine Learning Data Science Amazon Redshift Teradata Azure Synapse Google BigQuery Snowflake IBM Db2 SAP Oracle Autonomous Data Warehouse Hadoop Apache Airflow Amazon EMR Apache Spark Google Dataproc Cloudera Jupyter Amazon SageMaker Azure ML Studio MatLAB Domino Data Labs SAS TensorFlow PyTorch Apache Kafka Apache Spark Apache Flink Amazon Kinesis Azure Stream Analytics Tibco Spotfire Google Dataflow Confluent Disconnected systems and proprietary data formats make integration difficult Data Scientists Data Engineers Data Analysts Data Engineers Siloed data teams decrease productivity Load Real-time Database
  7. 7. Is your architecture enabling growth? Legacy on-premise data and analytics architectures are not keeping up Hadoop costs rising when costs need to be cut Innovation hinges on ML and predictive insights Business agility requires real-time data
  8. 8. Hadoop is costly, complex and ineffective Hadoop ecosystem is complex, hard to manage, and prone to failures 24/7 HDFS clusters that need to built for peak usage and are costly to upgrade • RIGID AND INELASTIC • DEVOPS INTENSIVE No out-of-box support for ML/AI and separate data and AI environments • LACKS AI CAPABILITIES Low Productivity Cost Prohibitive Slow Innovation X
  9. 9. Enterprises need a modern data and analytics architecture CRITICAL REQUIREMENTS Cost-effective scale and performance in the cloud Easy to manage and highly reliable for diverse data Predictive and real-time insights to drive innovation
  10. 10. Modernization delivers business value Forrester TEI study finds 417% ROI for companies switching to Databricks 47% Cost-savings from retiring legacy infrastructure 5% Increase in revenue 25% Data team productivity increase Source: Forrester TEI: The total economic impact of the Databricks Unified Analytics Platform
  11. 11. The Databricks Lakehouse Platform is one simple platform to unify all your data, analytics, and AI workloads Original creators of popular data and machine learning open-source projects Global company with over 5,000 customers and more than 450 partners
  12. 12. Data Warehouse Lakehouse One platform to unify all of your data, analytics, and AI workloads Data Lake
  13. 13. Structured Semi-structured Unstructured Streaming Lakehouse Platform Data Engineering BI & SQL Analytics Real-time Data Applications Data Science & Machine Learning Data Management & Governance Open Data Lake SIMPLE OPEN COLLABORATIVE From BI to AI All your data, analytics and AI on one Lakehouse platform
  14. 14. Data Eng, ML (Spark) Scalable apps on Columnar store (Hbase) ETL, SQL (Hive/ Impala) Databricks jobs / Delta Lake / SparkSQL (Highly tuned Spark engine: faster, less compute, one-stop-shop) Batch Process (MapReduce) Real-time Event Processing (Storm/ Spark) Databricks Spark jobs (orders of magnitude faster - but may need manual work) Databricks Structured Streaming (Spark Structured Streaming + Delta Lake: Streaming + Batch ingest) Databricks jobs/ Delta Lake (Highly tuned Spark engine: faster, less compute, one-stop- shop) Databricks Spark integrates w/ HBase on cloud (Alternatively: use cloud data stores well integrated with Databricks) Technology mapping: deliver better outcomes
  15. 15. Automation for most workload types Data Migration Metastore Migration SQL Migration Security Scheduled Data pulls Orchestration HDFS Hive Databases / Tables / Views Impala Databases / Tables/ Views HDFS Hive Queries Spark Queries Sentry permissions /Ranger policies HDFS access permissions Sqoop statements Oozie Jobs Azure ADLS Gen 2, AWS S3, GCS Databricks Tables Databricks Tables Spark Sql Databricks Notebooks Spark Sql Databricks Notebooks Databricks Notebooks Databricks permissions AWS IAM, ADLS ACLs Databricks compatible PySpark code Airflow DAGs & Databricks Jobs
  16. 16. 55-66 % reduction in costs and 2-3x reduction in timelines by using automation tools Data Migration Assessment & Design Manual Migration Workloads Migration, Validation Cutover Operations 17- 20 Weeks 8 Weeks Using Automation Accelerated Data & Workloads Migration, Validation Accelerated Assessment & Design Cutover Operations * Typical implementation scenario ~ 4 PB of Data and 3000 jobs with mixed workloads considered Same tool used for pre- migration Assessment
  17. 17. Our partner ecosystem accelerates migrations ISV Partners and Migration Tools Security Governance Consulting & SI Partners Databricks Migration SWAT team + CS Packaged Services For Migration Cloud
  18. 18. Modernization with Databricks - recap Why - costs, productivity, innovation → business impact Your competitors and market leaders are doing it NOW Databricks experts and automation strategy can help you migrate faster, with much lower cost and risk
  19. 19. Visit databricks.com/migration to learn more
  20. 20. Fireside chat Matt Graves VP of Enterprise Data & Analytics GCI Communication Corp
  21. 21. Backup
  22. 22. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Organizations with on-premises Hadoop infrastructure are bogged down by system complexity, unscalable infrastructure, and the increasing burden on DevOps to manage legacy architectures. Costs and resource utilization continue to go up while innovation has flatlined. In this session, you will learn why, now more than ever, enterprises are looking for cloud alternatives to Hadoop and are migrating off of the architecture in large numbers. You will also learn how elastic compute models’ benefits help one customer scale their analytics and AI workloads and best practices from their experience on a successful migration of their data and workloads to the cloud.

Views

Total views

151

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

7

Shares

0

Comments

0

Likes

0

×