Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data

290 views

Published on

Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, Python, Cloudera CDF, Cloudera CDH, Cloudera Data Science Workbench, CDSW, MiNiFi, Cloudera EFM, Apache Zeppelin, YOLO, Computer Vision, REST API, JSON, Integration, Java, Big Data, Phoenix, Hive, HDFS, Parquet, AVRO. Cloudera. Hortonworks. John Kuchmek. Tim Spann.

Published in: Data & Analytics
  • Be the first to comment

Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data

  1. 1. Edge to AI: Analytics from Edge to Cloud with Efficient Movement of Machine Data TIMOTHY SPANN JOHN KUCHMEK Field Engineer Solutions Engineer Cloudera Cloudera
  2. 2. 2 © Cloudera, Inc. All rights reserved. DISCLAIMER The information in this document is proprietary to Cloudera. No part of this document may be reproduced, copied or transmitted in any form for any purpose without the express prior written permission of Cloudera. This document is a preliminary version and not subject to your license agreement or any other agreement with Cloudera. This document contains only intended strategies, developments and functionalities of Cloudera products and is not intended to be binding upon Cloudera to any particular course of business, product strategy and/or development. Please note that this document is subject to change and may be changed by Cloudera at any time without notice. Cloudera assumes no responsibility for errors or omissions in this document. Cloudera does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability, fitness for a particular purpose or non-infringement. Cloudera shall have no liability for damages of any kind including without limitation direct, special, indirect or consequential damages that may result from the use of these materials. The limitation shall not apply in cases of gross negligence.
  3. 3. Introduction Tim Spann has been running meetups in Princeton on Big Data technologies since 2015. Tim has spoken at many international conferences on Apache NiFi, Deep Learning and Streaming. https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
  4. 4. Introduction John Kuchmek recently joined cloudera. Previously he worked at American Water as a data engineer and a data scientist where he worked extensively with both NiFi and Hadoop. https://dataworkssummit.com/san-jose-2018/session/bridging-the-gap- achieving-fast-data-synchronization-from-sap-hana-by-leveraging-hdp-hdf/
  5. 5. 5 © Cloudera, Inc. All rights reserved. DATAFLOW
  6. 6. 6© Cloudera, Inc. All rights reserved.
  7. 7. 7© Cloudera, Inc. All rights reserved. CLOUDERA FLOW MANAGEMENT ● Web-based user interface ● Highly configurable ● Out-of-the-box data provenance ● Designed for extensibility ● Secure ● NiFi Registry ○ DevOps support ○ FDLC ○ Versioning ○ Deployment
  8. 8. 8© Cloudera, Inc. All rights reserved. 300+ PROCESSORS FOR DEEPER ECOSYSTEM INTEGRATION Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket
  9. 9. 9© Cloudera, Inc. All rights reserved. MINIFI EDGE AGENTS • Edge data collection powered by MiNiFi • MiNiFi – smaller footprint than NiFi •Guaranteed delivery •Data buffering •Prioritized queuing •Flow-specific QoS •Data provenance •Designed for extension •C++ / Java agents •Tensorflow support • Designed for IoT
  10. 10. 10 © Cloudera, Inc. All rights reserved. MACHINE LEARNING
  11. 11. 11 © Cloudera, Inc. All rights reserved. MACHINE LEARNING AT CLOUDERA Our philosophy We empower our customers to run their business on data with an open platform: ● Your data ● Open algorithms ● Running anywhere We accelerate enterprise data science We help clients build their AI factory
  12. 12. 12© Cloudera, Inc. All rights reserved. OUR APPROACH Modern enterprise platform, tools and expert guidance to help you unlock business value with ML/AI Agile platform to build, train, and deploy many scalable ML applications Enterprise data science tools to accelerate team productivity Expert guidance, services & training to fast track value & scale
  13. 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. WE DELIVER AN ENTERPRISE DATA CLOUD IoT, Ingest & Streaming Data Engineering Data Warehouse Operational Database Machine Learning Catalog | Schema | Migration | Security | Governance Hybrid Cloud Public Multi-Cloud Edge Datacenter
  14. 14. 14 © Cloudera, Inc. All rights reserved. MACHINE LEARNING IS BUILT ON DATA MANAGEMENT We deliver an Enterprise Data Cloud for any data, anywhere, from the edge to AI DataFlow & Streaming Data Engineering Data Warehouse Operational Database Machine Learning Catalog | Schema | Migration | Security | Governance Hybrid Cloud Public Multi-Cloud Edge Datacenter Enterprise grade Secure, performant and compliant Scalable Elastic, cost-effective and lower TCO Runs anywhere Public cloud, on-premises, multi, hybrid
  15. 15. 15 © Cloudera, Inc. All rights reserved. PLATFORMS FOR INDUSTRIALIZED AI Manage pipelines + models Deploy models Automate pipelines Monitor performance DEPLOYDEVELOP Make teams more productive Explore data Develop reports, pipelines, models Collaborate with peers TRAIN Scale resources efficiently Train models Tune parameters Track performance End-to-end machine learning infrastructure for teams building at scale MANAGE Run anywhere with a common architecture Manage access and resources Scale cost with usage
  16. 16. 16 © Cloudera, Inc. All rights reserved. INDUSTRIALIZED AI REQUIRES LARGER DATA PLATFORM Streaming Ingest Batch Ingest Machine Learning Tools BI Tools and SQL Editors Data Products DATA, METADATA, SECURITY, GOVERNANCE, WORKLOAD MANAGEMENT MACHINE LEARNING DATA ENGINEERING DATA WAREHOUSE OPERATIONAL DATABASE
  17. 17. 17© Cloudera, Inc. All rights reserved. MACHINE LEARNING PHASES Where to Connect to Apache NiFi
  18. 18. Speed of Data Model Training Model Scoring Use Case Batch Batch Batch Batch Reporting, Analytics, Applications Online DS Applications/ Interactive Dashboards Streaming In-stream Streaming Applications Incremental/Online In-stream Streaming Applications Training, Scoring and Monitoring
  19. 19. 20© Cloudera, Inc. All rights reserved.
  20. 20. 21 © Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Accelerate machine learning from research to production For data scientists • Experiment faster Use R, Python, or Scala with on- demand compute and secure CDH/HDP data access • Work together Share reproducible research with your whole team • Deploy with confidence Get to production repeatedly and without recoding For IT professionals • Bring data science to the data Give your data science team more freedom while reducing the risk and cost of silos • Secure by default Leverage common security and governance across workloads • Run anywhere On-premises or in the cloud
  21. 21. 22 © Cloudera, Inc. All rights reserved. ACCELERATED DEEP LEARNING WITH GPUS Multi-tenant GPU support on-premises or cloud • Extend CDSW to deep learning • Schedule & share GPU resources • Train on GPUs, deploy on CPUs • Works on-premises or cloud CDSW GPUCPU CDH CPU CDH CPU single-node training distributed training, scoring “Our data scientists want GPUs, but we need multi-tenancy. If they go to the cloud on their own, it’s expensive and we lose governance.” GPU On CDH coming in C6
  22. 22. 23 © Cloudera, Inc. All rights reserved. DEMONSTRATION
  23. 23. 24 © Cloudera, Inc. All rights reserved. INTRODUCING MODELS Machine learning models as one-click microservices (REST APIs) Model APIs made easy! 1. Choose Python/R file, e.g. score.py 2. Choose function, e.g. forecast f = open('model.pk', 'rb') model = pickle.load(f) def forecast(data): return model.predict(data) 3. Choose resources
  24. 24. 25© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Select a Project, Create a Session, Load Libraries and Data CLOUDERA DATA SCIENCE WORKBENCH
  25. 25. 26© Cloudera, Inc. All rights reserved. Load a File and Run It CLOUDERA DATA SCIENCE WORKBENCH
  26. 26. 27© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Install Python Libraries for Python 2 or Python 3 CLOUDERA DATA SCIENCE WORKBENCH
  27. 27. 28© Cloudera, Inc. All rights reserved. Test your function with an argument CLOUDERA DATA SCIENCE WORKBENCH
  28. 28. 29© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCH Create a model from that file and function CLOUDERA DATA SCIENCE WORKBENCH
  29. 29. 30© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHList All The Models CLOUDERA DATA SCIENCE WORKBENCH
  30. 30. 31© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHDeploy the Model CLOUDERA DATA SCIENCE WORKBENCH
  31. 31. 32© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHCheckout The Build CLOUDERA DATA SCIENCE WORKBENCH
  32. 32. 33© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHTest the Model CLOUDERA DATA SCIENCE WORKBENCH
  33. 33. 34© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHValidate the Model Results CLOUDERA DATA SCIENCE WORKBENCH
  34. 34. 35© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHMonitor The Running Models CLOUDERA DATA SCIENCE WORKBENCH
  35. 35. 36© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHInvoke the Model From Apache NiFi In Flow CLOUDERA DATA SCIENCE WORKBENCH
  36. 36. 37© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHQuery Results of Classification in Flow { "class1": "cat", "cpu": 38.3, "end": "1549672761.1262221", "host": "gluoncv-apache-mxnet-29-50-7fb5cfc5b9-sx6dg", "memory": 14.9, "pct1": "98.15670800000001", "shape": "(1, 3, 566, 512)", "systemtime": "02/09/2019 00:39:21", "te": "3.380652666091919" } CLOUDERA DATA-IN-MOTION (APACHE NIFI)
  37. 37. 38© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHIntegrating Calls to CDSW Jobs CLOUDERA DATA-IN-MOTION (APACHE NIFI)
  38. 38. 39© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHPySpark Job for HDFS Storage CLOUDERA DATA SCIENCE WORKBENCH
  39. 39. 40© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHPySpark Job Receiving REST API CLOUDERA DATA SCIENCE WORKBENCH
  40. 40. 41© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHNiFi Job Integration CLOUDERA DATA SCIENCE WORKBENCH
  41. 41. 42© Cloudera, Inc. All rights reserved. CLOUDERA DATA SCIENCE WORKBENCHDisplay Data CLOUDERA DATA SCIENCE WORKBENCH

×