Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning in the Enterprise 2019

467 views

Published on

Machine Learning in the Enterprise 2019. These are the slides for my upcoming demo on integrating Machine Learning and Streaming with Apache NiFi and Cloudera Data Science Workbench. This is for the February 12th, 2019 Future of Data Princeton meetup.

Published in: Data & Analytics
  • Be the first to comment

Machine Learning in the Enterprise 2019

  1. 1. MACHINE LEARNING IN THE ENTERPRISE Timothy Spann | Senior Solutions Engineer @PaasDev
  2. 2. 2 © Cloudera, Inc. All rights reserved. DISCLAIMER DA
  3. 3. Introduction Tim Spann has been running meetups in Princeton on Big Data technologies since 2015. Tim has spoken at several international conferences on Apache NiFi. https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
  4. 4. Hadoop {Submarine} Project: Running deep learning workloads on YARN , Tim Spann (Cloudera)
  5. 5. IOT EDGE PROCESSING WITH MINIFI AND MULTIPLE DEEP LEARNING LIBRARIES
  6. 6. 8 © Cloudera, Inc. All rights reserved.
  7. 7. 9 © Cloudera, Inc. All rights reserved. The Industry’s First Enterprise Data Cloud From the Edge to AI
  8. 8. 10 © Cloudera, Inc. All rights reserved. WHY CLOUDERA? One stop shop for analytics Unified open architecture Hybrid and multi-cloud INGEST & STREAMING DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA ENGINEERING
  9. 9. 11 © Cloudera, Inc. All rights reserved. CLOUDERA DATA FLOW (CDF)
  10. 10. 12© Cloudera, Inc. All rights reserved.
  11. 11. 13© Cloudera, Inc. All rights reserved. MACHINE LEARNING PHASES Where to Connect to Apache NiFi
  12. 12. 14© Cloudera, Inc. All rights reserved. HANDS ON CDSW + NiFi https://community.hortonworks.com/articles/239961/using-cloudera-data-science-workbench-with-apache.html
  13. 13. © Cloudera, Inc. All rights reserved.
  14. 14. 16 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE
  15. 15. 17 © Cloudera, Inc. All rights reserved. MACHINE LEARNING IS A GROWTH ENGINE PROTECT business CONNECT products & services (IoT) DRIVE customer insights ● ● ● ● ● ● ● ● ● It’s enabling entirely new businesses, not just modernizing existing systems. Machine learning refers to algorithms and methods to extract useful patterns from data. When we say machine learning, we mean broad, transformational data capabilities.
  16. 16. 18 © Cloudera, Inc. All rights reserved. MOVING FROM EXPLORATION TO PRODUCTION OF ML & AI WE’RE WITNESSING THE INDUSTRIALIZATION OF AI FROM THE LAB… TO THE FACTORY
  17. 17. 19 © Cloudera, Inc. All rights reserved. ENTERPRISE-GRADE AI OPERATIONS WHETHER YOU ARE A FORTUNE 100 OR A STARTUP SECURITY, GOVERNANCE, COMPLIANCE STRATEGY PEOPLE & ORGANIZATION TECHNOLOGY
  18. 18. 20 © Cloudera, Inc. All rights reserved. AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA" CLOUD
  19. 19. 21 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT CLOUDERA Our philosophy ● ● ●
  20. 20. 22© Cloudera, Inc. All rights reserved. OUR APPROACH Modern enterprise platform, tools and expert guidance to help you unlock business value with ML/AI Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  21. 21. 23 © Cloudera, Inc. All rights reserved. PLATFORM
  22. 22. © Cloudera, Inc. All rights reserved. 24 AND ONE MORE THING….
  23. 23. 25 © Cloudera, Inc. All rights reserved. Amazon S3 Microsoft ADLS HDFS KUDU SECURITY GOVERNANCE WORKLOAD MANAGEMENT INGEST & REPLICATION DATA CATALOG Core Services Storage Services ANALYTIC DATABASE DATA SCIENCE EXTENSIBLE SERVICES OPERATIONAL DATABASE DATA ENGINEERING MACHINE LEARNING IS BUILT ON DATA MANAGEMENT Integrated data, workflows, metadata, security, governance, ...
  24. 24. 26 © Cloudera, Inc. All rights reserved. CLOUDERA ENTERPRISE DATA PLATFORM The modern platform for machine learning & analytics optimized for the cloud WORKLOADS 3RD PARTY SERVICES DATA ENGINEERING DATA SCIENCE DATA WAREHOUSE OPERATIONAL DATABASE DATA CATALOG GOVERNANCESECURITY LIFECYCLE MANAGEMENT STORAGE Microsoft ADLS COMMON SERVICES HDFS Amazon S3 CONTROL PLANE KUDU
  25. 25. 27 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH
  26. 26. 28 © Cloudera, Inc. All rights reserved. ACCELERATING THREE STAGES OF MACHINE LEARNING Manage models Deploy models Monitor performance DEPLOYDEVELOP Explore data Develop models Share results TRAIN Optimize parameters Track experiments Compare performance Enterprise AI platform supporting model development, training, and deployment
  27. 27. 29 © Cloudera, Inc. All rights reserved. A PLATFORM FOR MACHINE LEARNING • Open platform  • Complete lifecycle  • Team collaboration • Enterprise ready  • Runs anywhere RESEARCH | PRODUCTION LOCAL | SPARK | IMPALA/HIVE DEPLOYMENT COMPUTE OPEN SOURCE ECOSYSTEMALGORITHMS SELF-SERVICE TOOLS SOLUTIONS | USE CASESAPPS CLOUD ON-PREMISES ADLSS3 HDFS KUDU CATALOG | SECURITY | GOVERNANCE SHARED CONTEXT
  28. 28. 30 © Cloudera, Inc. All rights reserved. THE CHALLENGE Balance these needs DATA SCIENCE •Access to granular data •Flexibility • Preferred open source tools •Elastic provisioning • Compute • Storage •Reproducible research •Path to production DATA MANAGEMENT •Security •Governance •Standards •Low maintenance •Low cost •Self-service access
  29. 29. 31 © Cloudera, Inc. All rights reserved. THE TYPICAL SOLUTION “If I can’t use my favorite tools, I’ll…” • Copy data to my laptop • Copy data to a data science appliance • Copy data to a cloud service Why this is a problem: • Complicates security • Breaks data governance • Adds latency to process • Makes collaboration more difficult • Complicates model management and deployment • Creates infrastructure silos
  30. 30. 32 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate Machine Learning from Research to Production • • • • •
  31. 31. 33 © Cloudera, Inc. All rights reserved. CDSW ARCHITECTURE Extends traditional clusters with new ML capabilities • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Schedule & share GPU resources • Scale to CDH/HDP with Spark, Impala, Hive • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages shared platform services • Deployed with Cloudera Manager or package install (Ambari) CDH/HDP CDH/HDP Cloudera Manager/Ambari gateway node(s) CDH nodes Hive/Impala, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine Tristan
  32. 32. 34 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU CDH/HDP CPU CDH/HDP single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU CPU GPU
  33. 33. 35 © Cloudera, Inc. All rights reserved. A MODERN DATA SCIENCE ARCHITECTURE Containerized environments with scalable, on-demand compute • Built with Docker and Kubernetes • Isolated, reproducible user environments • Supports both big and small data • Local Python, R, Scala runtimes • Schedule & share GPU resources • Run Spark, Impala, and other CDH services • Secure and governed by default • Easy, audited access to Kerberized clusters • Leverages SDX platform services • Deployed with Cloudera Manager CDH CDH Cloudera Manager gateway node(s) CDH nodes Hive, HDFS, ... CDSW CDSW ... Master ... Engine EngineEngine EngineEngine
  34. 34. 36 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU CDH CPU CDH CPU single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU On CDH coming in C6
  35. 35. Confidential-Restricted – For Discussion Purposes Only HDP Edge Node HDP Node HDP Node HDP Node Ambari CDSW Worker Node HDFS, Hive, HBase, Spark, Phoenix… HDP Edge Node CDSW Master Node Browser HDP Edge Node CDSW Worker Node Cloudera Data Science Workbench Nodes CDSW on HDP Architecture
  36. 36. Confidential-Restricted – For Discussion Purposes Only CDSW 1.5.0 Support Matrix ● CDH 5 ● CDH 6 ● HDP 2.6.5 ● HDP 3.1.0
  37. 37. © Cloudera, Inc. All rights reserved. 39 Any tool or library THREE THINGS TO REMEMBER Built for teams End-to-end self-service 1 2 3
  38. 38. 40 © Cloudera, Inc. All rights reserved. DATA CATALOG GOVERNANCESECURITY LIFECYCLEWORKLOAD XM STORAGE Amazon S3 Microsof t ADLS HDFS KUDU INTRODUCING CLOUDERA MACHINE LEARNING Cloud-native enterprise machine learning platform DATA SCIENCE DATA ENGINEERING MODEL OPERATIONS CLOUDERA ML RUNTIME Python/R, Spark, TensorFlow, CPU/GPU-Optimized Interactive Development Batch Pipelines Predictive APIs Full capability of CDSW Rapid cloud provisioning and elastic autoscaling Unified data engineering and ML with seamless dependency management Multi-cloud portability powered by Kubernetes Connects to HDFS or cloud object storage and shared metadata Accelerated deep learning with distributed GPU training * Initially targeted for cloud managed K8s services, then OpenShift KUBERNETES EKS, AKS, GKE, OpenShift
  39. 39. 41 © Cloudera, Inc. All rights reserved. WHAT DATA SCIENCE TEAMS DO Ingest data at scale. Store and secure data. Clean and transform data for analysis. Explore data and build predictive models, offline. Evaluate and tune models. Develop and deliver a modeling pipeline. Test, verify, and approve model for deployment. Create and maintain batch/stream pipelines, embedded models, APIs. Update models in production. PREPARE DATA BUILD MODELS DEPLOY MODELS
  40. 40. 42 © Cloudera, Inc. All rights reserved. NEW: CLOUDERA DATA SCIENCE WORKBENCH 1.5 Accelerate and simplify machine learning from research to production ANALYZE DATA TRAIN MODELS • DEPLOY APIs • NEW! NEW! MANAGE SHARED RESOURCES
  41. 41. 43 © Cloudera, Inc. All rights reserved. INTRODUCING EXPERIMENTS Versioned model training runs for evaluation and reproducibility Data scientists can now... • Create a snapshot of model code, dependencies, and configuration necessary to train the model • Build and execute the training run in an isolated container • Track specified model metrics, performance, and model artifacts • Inspect, compare, or deploy prior models
  42. 42. 44 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) score.py forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data)
  43. 43. 45 © Cloudera, Inc. All rights reserved. MODEL MANAGEMENT View, test, monitor, and update models by team or project
  44. 44. 46 © Cloudera, Inc. All rights reserved. CLOUDERA FAST FORWARD LABS
  45. 45. 47 CLOUDERA FAST FORWARD LABS ADVISING & RESEARCH ML APPLICATION DEVELOPMENT ML STRATEGY ENGAGEMENT ML application strategy prescription ML expert advising research reports and prototypes Expert guidance to accelerate value and scale
  46. 46. 48 © Cloudera, Inc. All rights reserved. AS NEW TECH CAPABILITIES EMERGE, BE READY
  47. 47. THANK YOU

×