Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloudera SDX

780 views

Published on

In this webinar, we’ll show you how Cloudera SDX reduces the complexity in your data management environment and lets you deliver diverse analytics with consistent security, governance, and lifecycle management against a shared data catalog.

Published in: Technology
  • Be the first to comment

Cloudera SDX

  1. 1. CLOUDERA SDX CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY FOR ALL WORKLOADS Wim Stoop | Senior Product Marketing Manager Santosh Kumar | Senior Product Manager
  2. 2. 2 © Cloudera, Inc. All rights reserved. MULTI- DISCIPLINARY ANALYTICS
  3. 3. © Cloudera, Inc. All rights reserved. WE ALL HAVE BAGGAGE
  4. 4. 4 © Cloudera, Inc. All rights reserved. TRADITIONAL APPLICATION SILOS CONTEXT STORAGE APPLICATION SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG DATA SCIENCE FS SQL ANALYTIC DATABASE SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG RDBM S NOSQL & RT DATABASE SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG FS ETL & DATA ENGINEERIN G SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG RDBM S DATA WARE- HOUSE/MAR T RDBM S SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG
  5. 5. 5 © Cloudera, Inc. All rights reserved. A STRUGGLE AS OLD AS TIME: IT VS. BUSINESS For IT infrastructure & ops • Single use, inflexible data sources • Redundancy and fragmentation For users • Can’t find data, waiting on IT • Doing prep work, not finding insights For head of data & analytics • Administrative, not innovative • Can’t meet business requirements
  6. 6. 7 © Cloudera, Inc. All rights reserved. ON-PREMISES DEPLOYMENT APPLICATION DATA WARE- HOUSE/MAR T ETL & DATA ENGINEERIN G DATA SCIENCE SQL ANALYTIC DATABASE NOSQL & RT DATABASE STORAGE CONTEXT HDFS KUDU SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG
  7. 7. 8 © Cloudera, Inc. All rights reserved. CLOUD RE- INTRODUCES SILOS APPLICATION DATA WARE- HOUSE/MAR T ETL & DATA ENGINEERIN G DATA SCIENCE SQL ANALYTIC DATABASE NOSQL & RT DATABASE SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG STORAGE CONTEXT SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG SECURITY GOVERNANCE LIFECYCLE CONTROL CATALOG Microsoft ADLS Amazon S3 HDFS KUDUGoogle CP CLOUD
  8. 8. 9 © Cloudera, Inc. All rights reserved. CHALLENGES: SECURITY & GOVERNANCE • Sharing data across workloads • Requires multiple copies of data need to be created • Each with its own set of data context • Burdensome admin effort • Multiple clusters = multiple places to administer • One missing permission in one copy of the data can lead to significant financial and reputation risk • Difficult to share data safely for new analyses • Heavy new regulation such as GDPR makes the challenges even greater
  9. 9. 10 © Cloudera, Inc. All rights reserved. NEGATIVE BUSINESS IMPACT • Increased operational costs many distinct environments to buy and build • Increased staff overhead many distinct tools to learn and support • Increased security risks many distinct frameworks to enforce • Decreased business insights narrow data sets and analytics rigidity • Decreased business agility outdated and limiting for applications blah • Decreased governance capability no common visibility across stores
  10. 10. 12 © Cloudera, Inc. All rights reserved. DATA CONTEXT CHALLENGE Data stateful Compute stateless Context stateless
  11. 11. © Cloudera, Inc. All rights reserved. ENABLING STATEFUL AND CONSISTENT CONTEXT
  12. 12. 14 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE WITH SDX Benefits for IT infra & ops ● Central control and security ● Focus on curating not firefighting Benefits for users ● Find value from one source of truth ● Bring the best tools for each job WORKLOADS 3RD PARTY SERVICES DATA ENGINEERIN G DATA SCIENCE DATA WAREHOUS E OPERATIONA L DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU
  13. 13. 15 © Cloudera, Inc. All rights reserved. • Data Catalog: a comprehensive catalog of all data sets, spanning on-premises, cloud object stores, structured, unstructured, and semi-structured. Includes technical schemas from the Hive metastore, as well as business glossary definitions, classifications, and usage guidance • Security: role-based access control applied consistently across the platform using Apache Sentry. Also includes full stack encryption and key management • Governance: enterprise-grade auditing, lineage, and other governance capabilities applied universally across the platform with rich extensibility for partner integrations • Lifecycle Management: comprehensive ingest-to-purge management of data set lifecycle activities • Control Plane: multi-environment cluster provisioning, deployment, management, and troubleshooting SHARED DATA CONTEXT SERVICES Built for multi-function analytics anywhere
  14. 14. 16 © Cloudera, Inc. All rights reserved.16 DATA ENGINEERIN G DATA WAREHOUS E+ DATA ENGINEERIN G + DATA ENGINEERIN G DATA ENGINEERIN G + DATA SCIENCE ● Run ETL with Spark, MapReduce, or any number of partner tools ● Assign permissions and classifications once ● Data, along with all data context, is immediately available in the analytics database ● Run specialized transient workloads for security profiling, data preparation, ETL, etc. ● Partner tools can have dedicated clusters ● Data, along with all data context, is immediately available to all partner tools ● Run ETL with Spark, MapReduce, or any number of partner tools ● Assign permissions and classifications once ● Data, along with all data context, is immediately available for data science and machine learning EXAMPLE CLOUDERA SDX USE CASES Cloudera SDX makes it easy for administrators, BI users, data scientists to work together on a common data set, with consistent data context Partner tools can use and enrich data context automatically
  15. 15. 17 © Cloudera, Inc. All rights reserved. BASED ON COMMON CLOUDERA COMPONENTS Apache open source and Cloudera unique innovations DATA CATALOG HIVE METASTORE GOVERNANCE NAVIGATOR SECURITY SENTRY KERBEROS LIFECYCLE MANAGEMENT BDR NAVIGATOR COMMON SERVICES CONTROL PLANE HUE ALTUS MANAGER DIRECTOR Microsoft ADLS Amazon S3 Impala
  16. 16. 18 © Cloudera, Inc. All rights reserved. WITH YEARS OF EXPERIENCE 2010 2012 2014 2016 2018 HIVE METASTORE SENTRY HUE KERBEROS ALTUS BDR DIRECTOR MANAGER NAVIGATOR
  17. 17. 19 © Cloudera, Inc. All rights reserved. CLOUDERA ALTUS PAAS • Simple • Self-service • Auto-elastic • Role specific DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE DATA CATALOG GOVERNANC E SECURITY CONTROL PLANE LIFECYCLE MANAGEMEN T soon Amazon S3 Microsoft ADLS beta
  18. 18. 20 © Cloudera, Inc. All rights reserved. CLOUDERA SDX Available for all workloads that share data across clusters • Configured SDX: Self-managed clusters in the cloud - available as of C5.13 • Cloudera Altus SDX: Altus PaaS clusters - available where Altus is
  19. 19. 21 © Cloudera, Inc. All rights reserved. CLOUDERA SDX: MOTIVATION 2017-Onward Big Data Analytics and Cloud 1970-2010 OMIT Compute Context Data Self-contained appliances with compute, data and data context Cloudera EDH HiveImpala Data Context Spark Data 2010-2017 Big Data Analytics Cloudera EDH Spark Data Impala Hive Data Context Unified Platform Multiple Engines Shared Storage Shared Data Context Simplified Multi-Tenant Environment Multiple Compute Engines Shared Storage Shared and Persistent Data Context
  20. 20. Of course! We have our internal EDH cluster. That would be easy! With increased focus on … business insights.. dashboard … FAST... Charles, SVP, Emerging Businesses Mulyadi, Data Scientist
  21. 21. Pipelines! Workloads! Queries! More pipelines. More workloads! More queries! Even more…. Mulyadi, Data Scientist Alan, Internal EDH Data Platform Manager Adding more workloads to Internal EDH clusters is risky and adds uncertainty to existing SLA-sensitive workloads.
  22. 22. 24 © Cloudera, Inc. All rights reserved. ALAN’S PROBLEM Databases Tables Columns Partitions Views Data Size
  23. 23. 25 © Cloudera, Inc. All rights reserved. BACK TO CLOUDERA’S WORLD... Sales (SFDC/386 tables) Support (Clusterstats/340) Tables
  24. 24. 26 © Cloudera, Inc. All rights reserved. Maybe separate cluster with “required” data? Mulyadi, Data Scientist Alan, Internal EDH Data Platform Manager Why not!!
  25. 25. 27 © Cloudera, Inc. All rights reserved. OUR CUSTOMERS’ PROBLEMS Databases Tables Views Partitions Data Columns
  26. 26. 28 © Cloudera, Inc. All rights reserved. Data Migration Runtime ALAN AND MULYADI IN THE CLOUD WORLD Server Procurement Additional pipelines Data Migration Cost only Data Migration Dev Scripts EC2 Hours for Data Migration only
  27. 27. 29 © Cloudera, Inc. All rights reserved. Support DATA MIGRATION COSTS GROW EXPONENTIALLY Internal EDH Emerging Businesses Analytics Sales Analytics 37 15 47 27 27 15 Product Training Finance • No single source of truth • Synchronization overhead • Stale data
  28. 28. 30 © Cloudera, Inc. All rights reserved. Support EMBRACE UNIFICATION OF DATA & CONTEXT VIA SDX Emerging Businesses Analytics Sales Analytics Product Training Finance Internal EDH
  29. 29. 31 © Cloudera, Inc. All rights reserved. SDX RECAP • A differentiated capability for sharing of data and data context persistently • Enables sharing schema, security, governance, audit artifacts • Akin to linear scalability of Apache Hadoop itself
  30. 30. 32 © Cloudera, Inc. All rights reserved. SDX DEMO
  31. 31. © Cloudera, Inc. All rights reserved. CONSISTENT SECURITY, GOVERNANCE AND FLEXIBILITY
  32. 32. 34 © Cloudera, Inc. All rights reserved. DATA-DRIVEN JOURNEY USE CASES VISIBILITY Preventive & Proactive Maintenance IoT Hub for Industry 4.0 Advanced Threat Detection Risk Modelling & Analysis Marketing Systems Integration Customer 360 Insights Exploratory Data Science Data Warehouse Applied Machine Learning GROW Sales & Marketing CONNECT Operations & Product PROTECT Security & Compliance MODERNIZE IT, Tech, Data Science & Analytics
  33. 33. 35 © Cloudera, Inc. All rights reserved. CUSTOMER SUCCESSES FOR EDH & SDX Couldn’t solve predictive maintenance goals EDH delivers: • Ingest telematics in real-time • Machine learning to predict failures • Analytics to minimize service downtime • Protect sensitive and regulated data • Consistent security and governance • “SDX is the key to making that happen” - CIO Drug R&D too slow and expensive EDH delivers: • Self-service analytics • Meet HIPAA regulations • >5 petabytes from 2100 silos • Using Spark, Impala, & Search side-by-side • With Anaconda, AtScale, Cloudwick, Kinetica, StreamSets, Tamr, Trifacta, & Zoomdata
  34. 34. 36 © Cloudera, Inc. All rights reserved. POSITIVE BUSINESS OUTCOMES • Increased business insights diverse data together with analytics flexibility • Increased business agility modern and nimble application innovation • Increased governance capability one common viewpoint and store • Decreased operational costs one environment for all needs blahhhhh • Decreased staff overhead one set of controls for everything blahhhh • Decreased security risks comprehensive controls everywhere
  35. 35. 37 © Cloudera, Inc. All rights reserved. YOUR OWN CONSISTENT DATA CONTEXT Altus, powered by SDX Free trial: https://cloudera.com/altus Configured SDX For C5.13+: http://bit.ly/2Ms5OPO
  36. 36. THANK YOU

×