Successfully reported this slideshow.
Your SlideShare is downloading. ×

The new big data

More Related Content

The new big data

  1. 1. The New Big Data Scott Shaw
  2. 2. © 2020 Cloudera, Inc. All rights reserved. 2 DATA MANAGEMENT IS SPREAD ALL OVER 47% 21%24%26%32% On-premises Single cloudMulti cloudHybrid cloudPrivate cloud Gartner recently warned that “Data and analytics leaders must prepare for the complexities of multi cloud and intercloud deployments to avoid potential performance issues… unplanned cost overruns and ... difficulties with integration efforts.” HBR June 2019
  3. 3. © 2020 Cloudera, Inc. All rights reserved. 3 “Enterprise IT doesn’t operate at the speed of business. Your IT group needs to perform better than shadow IT.”Shadow IT as a % of overall IT spend CIO Magazine
  4. 4. © 2020 Cloudera, Inc. All rights reserved. 4 HOW TIMES HAVE CHANGED 2008 SCALE 1 JOB TO 1000s OF SERVERS 2020 SCALE 1 PLATFORM TO 1000s OF USERS
  5. 5. © 2020 Cloudera, Inc. All rights reserved. 5 CLOUDERA - THE ENTERPRISE DATA CLOUD COMPANY 01 Collect 03 Report 05 Predict 04 Serve 02 Curate Data Engineering Streaming & Data Flow Data Warehouse Operational Database Machine Learning & AI Security | Governance | Lineage | Management | Automation Manage and secure the data lifecycle in any cloud or datacenter
  6. 6. © 2020 Cloudera, Inc. All rights reserved. 6 BUSINESS USE CASES REQUIRE THE DATA LIFECYCLE An integrated lifecycle is easier to use, manage and secure SUPPLY CHAIN OPTIMIZATION COMPUTER VISION FOR QA PREDICTIVE MAINTENANCE PROCESS MONITORING DASHBOARDS REAL-TIME & TRANSACTIONAL DATA LIFECYCLE USE CASES ENTERPRISE DATA ENTERPRISE DATA CLOUD ENTERPRISE USE CASES CONNECTED PRODUCTS CONNECTED PRODUCTION CONNECTED SUPPLY CHAIN CONNECTED CONSUMER THROUGHPUT OPTIMIZATION SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION
  7. 7. © 2020 Cloudera, Inc. All rights reserved. 7 CLOUDERA DATA PLATFORM
  8. 8. COMPONENT ARCHITECTURE
  9. 9. © 2020 Cloudera, Inc. All rights reserved. 9 THE ENTERPRISE DATA CLOUD COMPONENTS Traditional Platform Consumption: • Data Hub Clusters New analytic experiences: • Data Warehouse • Machine Learning • Data Engineering • Operational Database • More to come Control Plane services: • Workload Manager • Replication Manager • Data Catalog • Management Console
  10. 10. © 2020 Cloudera, Inc. All rights reserved. 10 KEY CONCEPTS & COMPONENTS Environment •1 Template •1 Region •1 VPC •Multiple Roles/Buckets Data Lake •SDX: Atlas, Ranger, Knox, IdBroker, CM •Associated with groups/users Data Hub Clusters / Experiences •DH templates •ML Env •DW Database Catalogs/Virtual Compute 1:1 1:N ENVIRONMENTS
  11. 11. © 2020 Cloudera, Inc. All rights reserved. 11 KEY CONCEPTS & COMPONENTS Typical user flow Enterprise IT CDP Control Plane Enterprise Cloud Resources (IAM, Network, VMs, Buckets, etc.) Management Console 1 Step 1 User connects to CDP with their enterprise identity Step 2 They create an environment and data lake for their enterprise 2 Environment Step 3 They create data hub clusters for traditional workloads Data Lake Atlas Ranger Knox IdBroker FreeIPA CM HMS 3 BI Team Cluster ETL Team Cluster 4 Node 1 Node 2 Node 3 Step 4 They create access points for containerized analytic experiences Node 1 Node 2 Node 3 Data Warehouse Experience Machine Learning Experience
  12. 12. © 2020 Cloudera, Inc. All rights reserved. 12 ENVIRONMENT What is an environment? Definition of where CDP creates resources in a customer environment. A long running permanent cluster called a Data Lake gets created here.
  13. 13. © 2020 Cloudera, Inc. All rights reserved. 13 DATA LAKE What is a Data Lake? A common set of Services (SDX) within an Environment that are shared across multiple Clusters/Experiences. These include Services for: • Security • Auditing • Governance • Data Discovery
  14. 14. © 2020 Cloudera, Inc. All rights reserved. 14 DATA HUB CLUSTERS AND EXPERIENCES What are the consumption options? A Data Hub Cluster is a customizable environment that runs like a traditional Hadoop cluster, but is designed to leverage Cloud Storage. An Experience is a container-based compute environment for specific purposes: ML, DW, DE, OD, DF
  15. 15. © 2020 Cloudera, Inc. All rights reserved. 15 CONTROL PLANE What is the Control Plane? The Control Plane is the common set of tools for management, workload analysis, data movement and data discovery across multiple environments
  16. 16. PRODUCT WALKTHROUGH
  17. 17. HYBRID ARCHITECTURE
  18. 18. © 2020 Cloudera, Inc. All rights reserved. 18 TARGET ARCHITECTURE: THE ENTERPRISE DATA CLOUD CDP Public Cloud (platform-as-a-service) Cloudera Runtime Control Plane Data Hub Virtual Private Clusters DW, ML, DE, … Self-Serve Experiences Data Hub Virtual Private Clusters DW, ML, DE, … Self-Serve Experiences CDP On-Prem (installable software) AzureAWS GCP Private Cloud CDP Datacenter
  19. 19. © 2020 Cloudera, Inc. All rights reserved. 19 OpenShift 101 Master Nodes Worker Node ➔ OpenShift → Kubernetes++ ➔ K8s → System to deploy, scale, manage apps ➔ Applications → exposed through services ➔ Service → collection of Pods ➔ Pods → collection of containers ➔ Containers → runtime environment Worker Node Worker Node Container Pod CPU RAM Disk CPU RAM Disk CPU RAM Disk Kubelet Kubelet Kubelet
  20. 20. THANK YOU

×