Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cloudera Enterprise 6.0 Update GA and Beyond 9.25.18

1,284 views

Published on

Cloudera Enterprise 6.0 provides a major upgrade to our modern platform for machine learning and analytics with significant advances in productivity and enterprise quality. We have tuned compute resources to maximize performance and minimize total cost of ownership (TCO).

Published in: Technology
  • Be the first to comment

Cloudera Enterprise 6.0 Update GA and Beyond 9.25.18

  1. 1. CLOUDERA ENTERPRISE 6.0 UPDATE: GA AND BEYOND
  2. 2. 2 © Cloudera, Inc. All rights reserved. TODAY’S SPEAKERS Matthew Schumpert Product Management Director mschumpert@cloudera.com John Kennedy Senior Manager john.kennedy@cloudera.com
  3. 3. 3 © Cloudera, Inc. All rights reserved. SUPPORTING BUSINESS OBJECTIVES CONNECT PRODUCTS & SERVICES (IoT) GROW BUSINESS PROTECT BUSINESS
  4. 4. 4 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE DATA PLATFORM The modern platform for machine learning & analytics optimized for the cloud WORKLOADS 3RD PARTY SERVICES DATA ENGINEERIN G DATA SCIENCE DATA WAREHOUS E OPERATIONA L DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU
  5. 5. 5 © Cloudera, Inc. All rights reserved. ENTERPRISE GRADE HYBRID MODERN PLATFORM CAPABILITIES UNIFIED  Diverse analytics  Shared experience  Any environment  Secure  Scalable  Compliant  Storage  Compute  Control
  6. 6. 6 © Cloudera, Inc. All rights reserved. DEPLOYMENT FLEXIBILITY PRIVATE CLOUD BARE METAL SDX in EDH clusters VIA CLOUDERA MANAGER HDFS, KUDU DATA ENGINEERING DATA WAREHOUSE DATA SCIENCE OPERATIONAL DATABASE HDFS, KUDU, S3, ADLS S3, ADLS SDX Reference Architecture Altus SDX VIA CLOUDERA ALTUS INFRASTRUCTURE SERVICES
  7. 7. 7 © Cloudera, Inc. All rights reserved. WORKLOADS 3RD PARTY SERVICES DATA ENGINEERING DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU • Data Catalog: a comprehensive catalog of all data sets, spanning on-premises, cloud object stores, structured, unstructured, and semi-structured. Includes technical schemas from the Hive metastore, as well as business glossary definitions, classifications, and usage guidance • Security: role-based access control applied consistently across the platform using Apache Sentry. Also includes full stack encryption and key management • Governance: enterprise-grade auditing, lineage, and other governance capabilities applied universally across the platform with rich extensibility for partner integrations • Lifecycle Management: comprehensive ingest-to-purge management of data set lifecycle activities • Control Plane: multi-environment cluster provisioning, deployment, management, and troubleshooting SHARED DATA CONTEXT SERVICES Built for multi-function analytics anywhere
  8. 8. 8 © Cloudera, Inc. All rights reserved. CLOUDERA 6 HIGHLIGHTS INNOVATION Building unified analytics applications is easier than ever by bringing the most capable and stable versions of open-source tools with our integrated, multi-disciplinary distribution. ENTERPRISE QUALITY Rather than wrangling purely open source projects, Cloudera’s enterprise customers trust the quality control and safety that only a complete platform can offer. PRODUCTIVITY Enable the business to get answers more quickly—which improves data scientist and business analyst productivity and optimizes resource utilization to accelerates analytics.
  9. 9. 10 © Cloudera, Inc. All rights reserved. CLOUDERA 6 IS NOW GENERALLY AVAILABLE A giant leap forward in our open source core CLOUDERA MANAGER 6.0 • CLOUDERA NAVIGATOR 6.0 CLOUDERA DIRECTOR 6.0 HADOOP 3.0 HBASE 2.0HIVE 2.1 PARQUET 1.9SPARK 2.2 SOLR 7.0 SENTRY 2.0OOZIE 5.0 AVRO 1.8KAFKA 1.0 FLUME 1.8 HUE 4.2SQOOP 1.4
  10. 10. 11 © Cloudera, Inc. All rights reserved. PARTNERS CERTIFIED ON CDH6 Arcadia Data provides the first native visual analytics software that runs within modern data platforms for optimal scale, performance, and security. Syncsort organizes data everywhere, to keep the world working – the same data that powers machine learning, AI and predictive analytics. Zoomdata enables the fastest visual analytics for big data. Immerse yourself in dynamic visualizations that unfold the story in front of you.
  11. 11. 12 © Cloudera, Inc. All rights reserved. CLOUDERA MANAGER 6 Fine-grained Admin Controls Assign isolated administrative privileges for each cluster under management in order to improve efficiency and reduce risk Automated Wire Encryption Reduce risk and administrative effort by automatically configuring TLS wire encryption for a wide variety of CDH components Scale Manage up to 2,500 nodes with a single Cloudera Manager instance blah blah blah blah blah blah blah blah blah blah Upgrade from C5 Simplify upgrades from CDH5 with pre-upgrade validations and environment-specific upgrade docs • Improve scale • Improve efficiency • Reduce risk • Upgrade simplicity
  12. 12. 13 © Cloudera, Inc. All rights reserved. SOLR 7 JSON Facet API • Richer analytics capabilities & more fine grained partitions lead to deeper insights on unstructured data Streaming Expressions • A new approach to processing queries and indexes • More powerful compute on the entire matching data set: time series, math functions, NLP and much more
  13. 13. 14 © Cloudera, Inc. All rights reserved. HBASE 2.0 Manageability • New assignment manager • Simpler replication configuration • New CLI commands • New compaction tool • Improved metrics Reliability • Over 2,000 bug fixes • Operational simplicity Performance • Avoid java heap for caching and read paths • Multi-threaded old file cleanup • Concurrent prefetch of data
  14. 14. 15 © Cloudera, Inc. All rights reserved. HIVE 2.1 Better Debugging • Faster surfacing of issues leads to tighter controls and enhanced cluster stability Parquet Vectorization • 20% to 80% performance increase API Standardization • Elimination of costly app rewrites increases developer trust and efficiency • Increased productivity • Improved performance • Enterprise readiness
  15. 15. 16 © Cloudera, Inc. All rights reserved. HUE 4.2 Self Service Analytics • Intelligent Table Discovery Wizards • Index Creation Designers • Query Design Assists & Hints Seamless Business User Experience • 360 degree insight for structured AND unstructured data • Optimized UI look and feel - shorter time to get started and get to answers for non- technical users
  16. 16. 17 © Cloudera, Inc. All rights reserved. POLL 1: WHICH C6 UPDATES ARE YOU LOOKING FORWARD TO MOST? Multiple choice, multiple answer • Cloudera Manager 6 • Solr 7.0 • Hbase 2.0 • Hive 2.1 • Hue 4.2 • Other
  17. 17. © Cloudera, Inc. All rights reserved. COMING SOON ...
  18. 18. 19 © Cloudera, Inc. All rights reserved. HDFS ERASURE CODING Why • Cut storage costs in half Considerations • Relative data temperature • Relative availability of spare storage capacity vs. spare network capacity • Access to Intel CPUs with ISA-l Typical usage • Enable EC for cold directories • Migrate data from hot to cold directories over time using distcp • Update CDH services to read new directories (e.g. Hive Metastore) Relative job performance with EC • Write-only jobs are faster because less data to write • Read-only jobs are about the same • Typical job performance is slightly faster with Erasure Coding Relative reliability • Supports up to 2 node failures without data loss (just like 3x replication) • Parity can be increased via configuration
  19. 19. 20 © Cloudera, Inc. All rights reserved. YARN ENHANCEMENTS Resource Types* • Extend YARN’s view of consumable resources per node beyond vCores & Memory with custom resources types • Examples: GPUs, FPGAs • Example: “Node with R licenses” Oozie on YARN • Improve Oozie runtime performance • Simplify debugging * Roadmap
  20. 20. 21 © Cloudera, Inc. All rights reserved. SOLR 7 SQL Interface • Enables searching Solr indexed data using SQL queries • Deeper insight over combined structured and unstructured data Graph Query • New execution framework allows more powerful processing CDCR • There is no need for this Solr feature in Cloudera Search as we have multiple other more scalable options to replicate data across DCs * Roadmap
  21. 21. © Cloudera, Inc. All rights reserved. PRACTICALITIES ...
  22. 22. 23 © Cloudera, Inc. All rights reserved. C6 DEPRECIATIONS Java versions • Oracle JDK 1.7 Operating Systems • Red Hat Enterprise Linux 5 • CentOS 5 • Oracle Linux 5 (both RHCK & UEK) • SLES 11 • All Debian versions (Ubuntu continues to be supported) Databases • Oracle 11g • Mysql 5.0, 5.1 • Postgresql 8.1, 8.4 Cloudera Enterprise • Cloudera’s Distribution of Kafka (CDK) 1.x (includes Apache Kafka 0.8.x) • Legacy Scala clients for Kafka • Flume Receiver in Spark • HBaseSink in Flume (replaced by HBase2Sink) • Multi Cloudera Manager Dashboard • Kite Dataset API • Crunch • Hive’s org.apache.hadoop.hive.ql.exec.UDF API
  23. 23. 24 © Cloudera, Inc. All rights reserved. C6 REMOVALS • DataFu • Some Solr 4 features, data types, and APIs are no longer supported in Solr 7 (we have a scan tool to help you detect most common ones) • Management of Key Trustee Server without Cloudera Manager • YARN Capacity Scheduler • MapReduce Pipes • Hue 3 Old interface and editor • Sqoop 2 • Spark 1.x • Flume AsyncHBaseSink • All classes in com.cloudera.sqoop packages • Multi Cloudera Manager Dashboard • Llama • MapReduce 1 • Spark Standalone mode • Mahout • Whirr • Old NameNode UI • Navigator Encrypt File-Level Encryption using eCryptfs • Parquet Libraries under parquet.* Java package: Renamed to org.apache.parquet.* • CDH Tarball Distribution • CM Tarball Distribution • Sentry Policy Files
  24. 24. 25 © Cloudera, Inc. All rights reserved. C6 UPGRADE: REQUIREMENTS • Upgrading to CM 6 requires no CDH downtime (rolling restart) • Upgrading to CDH 6 requires full cluster downtime • Manual rollback will be documented • Automated downgrade not possible • Upgrading from C6 Beta to C6 GA not supported
  25. 25. 26 © Cloudera, Inc. All rights reserved. SUPPORTED PLATFORMS OS Specific
  26. 26. 27 © Cloudera, Inc. All rights reserved. INFRASTRUCTURE UPGRADES Before C6 upgrade 1. Review your current versions of OS and JDK 2. Plan what your final state for OS versions and JDK needs to be a. You need to be on JDK 8. 3. Execute the upgrade of OS and JDK on all hosts 4. Begin planning and then execute your Cloudera Manager Upgrade 5. Begin planning then execute your Cloudera CDH Upgrade.
  27. 27. 28 © Cloudera, Inc. All rights reserved. KEY POINTS Where to get more information on Upgrading
  28. 28. 29 © Cloudera, Inc. All rights reserved. KEY POINTS High-level guidance
  29. 29. 30 © Cloudera, Inc. All rights reserved. KEY POINTS Producing specific Upgrade steps for your setup
  30. 30. 31 © Cloudera, Inc. All rights reserved. KEY POINTS • Interactive UI produces specific technical steps for your upgrade path, • In one place
  31. 31. 32 © Cloudera, Inc. All rights reserved. POLL 2: WHEN DO YOU EXPECT TO UPGRADE TO C6? Multiple choice, single answer • This month • This quarter • Next quarter • Next year • Once 6.x is available • Don’t know
  32. 32. 33 © Cloudera, Inc. All rights reserved. GET CLOUDERA ENTERPRISE 6 TODAY
  33. 33. THANK YOU

×