Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Modern Data Warehouse Fundamentals Part 1

Explore new trends and use cases in data warehousing including exploration and discovery, self-service ad-hoc analysis, predictive analytics and more ways to get deeper business insight. Modern Data Warehousing Fundamentals will show how to modernize your data warehouse architecture and infrastructure for benefits to both traditional analytics practitioners and data scientists and engineers.

  • Login to see the comments

Modern Data Warehouse Fundamentals Part 1

  1. 1. © Cloudera, Inc. All rights reserved. 1
  2. 2. MODERN DATA WAREHOUSE FUNDAMENTALS Part I: Introducing the Modern Data Warehouse - Challenges, Use Cases, and Opportunities December, 2018
  3. 3. © Cloudera, Inc. All rights reserved. 3 SPEAKERS Eva Nahari Director, Product Management David Dichmann Director, Product Marketing
  4. 4. Why Modernize Your Data Warehouse? The Case for a Modern Data Warehouse
  5. 5. 5 © Cloudera, Inc. All rights reserved. LARGE NORTH AMERICAN BANK • LoB Data Analysts access all data • Saved $4M+ in deposit fraud Terabytes Users Databases Queries / Month FRAUD PREVENTION
  6. 6. 6 © Cloudera, Inc. All rights reserved. GLOBAL PHARMACEUTICAL • Curated Use and Agile Discovery with HIPAA compliance • Accelerated new Drug Development Use Cases Users Fewer Silos Diverse Data NEW PRODUCT DEVELOPMENT
  7. 7. 7 © Cloudera, Inc. All rights reserved. MAJOR TELCO MANUFACTURER • $10 M new revenue from optimized marketing • $30 M+ from Price Optimization • $100K+ from weather correlationQuery Responses New Sources Min. Data Sets Users BUSINESS OPTIMIZATION
  8. 8. © Cloudera, Inc. All rights reserved. 8 NEW TRENDS IN DATA WAREHOUSING Deeper Business Insights at Extreme Speed and Scale While Managing Cost DEEPER business insights EXTREME speed & scale CONTROLLED resources & costs
  9. 9. © Cloudera, Inc. All rights reserved. 9 NEW TRENDS IN DATA WAREHOUSING Deeper Business Insights Protect ● Proactive Fraud Prevention ● Keep up with Regulatory Compliance ● Preempt Cyberthreats Real-time response on massive data volume and variety Optimize ● Improve Operational Efficiency ● Support Internet of Things (IoT) New analytics techniques democratized to all users Grow ● Customer Sentiment ● Fault Prevention ● Improve Product Quality ● New Revenue Streams Experimentation and collaboration at scale
  10. 10. © Cloudera, Inc. All rights reserved. 10 NEW TRENDS IN DATA WAREHOUSING Extreme Speed and Scale More Data ● Massive amounts handled faster at scale ● More variety from new sources (social media, IoT) ● Insight within minutes of new data arrival Performance and flexibility at scale More Workloads ● 100’s of production grade deployments ● Enterprise grade dependability ● Strict security and governance On-demand scale out, discovery, collaboration More People ● 1,000’s of new users and new user types ● 1,000’s of new use cases ● All skill levels: Analytics, Data Science, and Machine Learning All workloads with a shared data experience
  11. 11. © Cloudera, Inc. All rights reserved. 11 NEW TRENDS IN DATA WAREHOUSING Managing Resources and Costs Optimize Core Processes ● Automation to reduce pressure on organizational bottlenecks ● Consistent user experience Broaden data reach without increasing IT burden or costs Self-Service Everything ● Resource provisioning ● Workload development ● Optimizing and troubleshooting Deliver on increased SLA pressures without runaway cost Dynamic Consumption ● Transient Workloads ● Short-lived Workloads ● Permanent Workloads ● Public, Private, Hybrid Cloud Environmental flexibility and adaptive compute, storage
  12. 12. © Cloudera, Inc. All rights reserved. 12 Quickly enable business analytics by sharing petabytes of verified data across thousands of users while surpassing demands of SLAs and costs
  13. 13. 13 © Cloudera, Inc. All rights reserved. TRADITIONAL DATA WAREHOUSE: Structured Data Sources (ERP, CRM, SCM) Transformations EDW Advanced Analytics Dashboards Ad Hoc Canned Reports Staging Data Marts Many Months Master Schema ETLODS 2 3 4 1 5 Struggle to handle volume and variety Limited access
  14. 14. 14 © Cloudera, Inc. All rights reserved. WHAT CONCEPTS SURVIVE? Data Modeling Security & Governance Reports & Dashboards
  15. 15. 15 © Cloudera, Inc. All rights reserved. WHAT HAS CHANGED? Traditional DW Modern DW Supporting Role Foundational Role Primarily Internal Internal & External Constrained, Structured Freeform, Multi-Structured Planned ETLs On-Demand Pipelines Users Data Exploration Data Curation Data & Analytics
  16. 16. 16 © Cloudera, Inc. All rights reserved. WHAT IS NEW? Experimentation & Collaboration Dynamic Consumption Self Service Everything
  17. 17. 17 © Cloudera, Inc. All rights reserved. MODERN DATA WAREHOUSE Advanced Analytics Dashboards Ad Hoc Canned Reports Data Store Within Days Data Marts 1 2 Ingest & Store all data at scale Self-serve / On- demand Variety of data sources/types
  18. 18. 18 © Cloudera, Inc. All rights reserved. CLOUDERA MODERN DATA WAREHOUSE The modern platform for machine learning and analytics optimized for the cloud Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services ANALYTICSDATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING
  19. 19. 19 © Cloudera, Inc. All rights reserved. Preferred BI & ELT ToolsSQL Workbench Workload XM Navigator & Sentry Impala MPP Query Engine Hive-on-Spark / Spark MPP ELT Processing KUDU | HDFS Local Storage AWS S3 | ADLS Object Storage Shared Data Experience (SDX) Optimized File Formats (Parquet, Avro) Solr MPP Search Analytics Cloudera Manager HYBRID Controls HYBRID Compute HYBRID Storage A MODERN DATA WAREHOUSE SOLUTION Altus
  20. 20. 20 © Cloudera, Inc. All rights reserved. Proactively Optimize Workloads WORKLOAD XM Self Serve Diagnostics and Optimizations Self Serve Analytics Workbench Move faster Serve more users Reduce IT pressure
  21. 21. 21 © Cloudera, Inc. All rights reserved. EXTREME SPEED & SCALE Fastest ELT at Scale for Data Engineers Fastest Self-Service BI at Scale for Analysts & Developers Impala Flexibility at scale 1000s of users On-demand scale out Speed to insight
  22. 22. 22 © Cloudera, Inc. All rights reserved. EXPLORE Discovery (raw) EXPERIMENT Exploration (curated) EMERGING LOB Prep - New Report SALES BI/New Reporting EXPERIMENT Model Build/Test DEV & TEST Prep – Known FINANCE Regular Reporting Shared Storage (HDFS, KUDU, S3, ADLS) Shared Metadata, Security, Governance Landing Zone Experimental Zone Archived ZoneRefined Zone ON-DEMAND SCALING & MULTI-TENANCY
  23. 23. 23 © Cloudera, Inc. All rights reserved. Stateful Context, Shared Experience ENABLES FULL FLEXIBILITY AND DYNAMIC CONSUMPTION
  24. 24. Confidential-Restricted – For Discussion Purposes Only24 © Cloudera, Inc. All rights reserved. CLOUD NATIVE OPTION - ALTUS DW ● Quick time to value - no software or clusters to manage ● Bring warehouse to the data with zero copy simplicity ● Use your security policies with your data - no proprietary stacks ● Apply enterprise governance to transient workloads ● Shared data experience with SDX ● Optimized for Azure & AWS DATA WAREHOUSE GOVERNANCESECURITY ALTUS CONTROL PLANE LIFECYCLE MANAGEMENT MULTI-CLOUD Amazon S3 Microsoft ADLS MULTI-CLOUD PAAS SOLUTION
  25. 25. 25 © Cloudera, Inc. All rights reserved. Moving from Known Questions on Known Data to Unknown Questions on Unknown Data FROM ANALYTICS TO MACHINE LEARNING 25 DATA ENGINEERING DATA WAREHOUSE + + ● Run ETL with Spark or partner tools to ingest and process data at any scale ● Assign permissions and classifications once ● Data, along with all data context, is immediately available in the data warehouse for analytical processing and BI use cases ● Run data science and machine learning analysis to blend, augment, and score data ● Blended and augmented data, along with all data context, is immediately available to to business teams and analysts with unified security and governance DATA WAREHOUSE DATA SCIENCE Cloudera SDX makes it easy for administrators, BI users, data scientists to work together on a common data set, with consistent data context BETTER TOGETHER
  26. 26. 26 © Cloudera, Inc. All rights reserved. TOOLS & FRAMEWORKS FOR SUCCESS Plan Offload (Optional) Optimize Estimate Effort Risk Analysis Schema Design Test & Validate Evaluate Identify Use Cases Impact Analysis Set Objectives Prioritized Plan Initial POC Identify Suitable Workloads Offload Actions Capacity Planning Fine Tuning Data Model on Hadoop Optimize Queries for Performance Validate ROI, Cost
  27. 27. 27 © Cloudera, Inc. All rights reserved. TD BANK: Delivering “Legendary Customer Experience” CHALLENGES Significantly improve customer experience with sentiment analysis, behavioral patterns, and predictive modeling Current system couldn’t handle: • Centralizing data from thousands of sources • Demands from increased users and use cases • Data cost and manageability at scale RESULTS • 30% reduction in repeat customer complaints • 90% productivity improvement for analytics projects • 60% decrease in data management costs • 98% decrease in per TB storage costs SOLUTION Modern Data Warehouse for customer marketing, fraud analytics and cybersecurity • Ingest data from 100+ corporate systems • Centralized data into “the hands of those that need it much more quickly” • Significantly reduce storage and management costs
  28. 28. 28 © Cloudera, Inc. All rights reserved. DEUTSCHE TELEKOM: Fraud reduction and customer retention CHALLENGES Improve fraud detection speed to near-real time and respond to network service quality issues before customers notice Current system couldn’t handle: • Massive volumes of network data - at higher granularity • Enterprise view of data - machine learning at scale • Near-real time fraud detection on incoming data RESULTS • 10-20% reduction in revenue loss by increased fraud detection • 5-10% decrease in customer churn with increased network quality • 50% increase in overall operational efficiencies with faster analytics SOLUTION Modern Data Warehouse to detect fraud patterns and network problems in real-time before business impact • Quickly analyze massive streaming data sets • Enterprise grade reliability and stability with shared data experience (no silos) • Machine learning and fast analytics - real-time
  29. 29. 29 © Cloudera, Inc. All rights reserved. KOMATSU MINING: Optimize Machine Performance CHALLENGES Create an Industrial IoT (IIoT) solution for optimizing mining equipment utility and build better next-generation products Current system couldn’t handle: • Scale of IoT data • Demand for new users and use cases • 30TB/month data growth RESULTS • 2X Increase in production hours on key equipment • Design next-generation equipment: environmentally smarter, more productive, at lower cost • Meet or exceed all KPIs: “Deliver all of the data with less complexity and significant cost savings” SOLUTION Cloud-based IIoT analytics for a full view of mining operations • Quickly and easily analyze huge volume and variety (time-series, sensor, event, and more) of data • More use cases and users: “democratizing analytics for different user groups” • Scale quickly and easily in the cloud
  30. 30. 30 © Cloudera, Inc. All rights reserved. CLOUDERA DW - PARTING THOUGHTS Hybrid Optimized Shared Data ExperiencePerformance @Scale Shared Data Exponential Use Cases, Successful Outcomes
  31. 31. THANK YOU
  32. 32. © Cloudera, Inc. All rights reserved. 32