Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning on HDP 2018 Prague

Future of Data Prague Meetup
12 April 2018 Thursday
TensorFlow, Apache MXNet

Deep learning on HDP 2018 Prague

  1. 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Deep Learning on HDP Prague 2018 Timothy Spann, Solutions Engineer Hortonworks @PaaSDev
  2. 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  3. 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Agenda • Data Engineering With Deep Learning • TensorFlow with Apache NiFi • TensorFlow on YARN • Apache MXNet Pre-Built Models • Apache MXNet Model Server With Apache NiFi • Apache MXNet in Apache Zeppelin Notebooks • Apache MXNet On YARN • Demos • Questions
  4. 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning for Big Data Engineers Multiple users, frameworks, languages, data sources & clusters BIG DATA ENGINEER • Experience in ETL • Coding skills in Scala, Python, Java • Experience with Apache Hadoop • Knowledge of database query languages such as SQL • Knowledge of Hadoop tools such as Hive, or Pig • Expert in ETL (Eating, Ties and Laziness) • Social Media Maven • Deep SME in Buzzwords • No Coding Skills • Interest in Pig and Falcon CAT AI • Will Drive your Car • Will Fix Your Code • Will Beat You At Q-Bert • Will Not Be Discussed Today • Will Not Finish This Talk For Me, This Time http://gluon.mxnet.io/chapter01_crashcourse/preface.html
  5. 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Use Cases So Why Am I Orchestrating These Complex Deep Learning Workflows? Computer Vision • Object Recognition • Image Classification • Object Detection • Motion Estimation • Annotation • Visual Question and Answer • Autonomous Driving • Speech to Text • Speech Recognition • Chat Bot • Voice UI Speech Recognition Natural Language Processing • Sentiment Analysis • Text Classification • Named Entity Recognition https://github.com/zackchase/mxnet-the-straight-dope Recommender Systems • Content-based Recommendations
  6. 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Options • TensorFlow (C++, Python, Java) • TensorFlow on Spark (Yahoo) • Caffe on Spark (Yahoo) • Apache MXNet (Baidu, Amazon, Nvidia, MS, CMU, NYU, intel) • Deep Learning 4 J (Skymind) JVM • PyTorch • H2o Deep Water • Keras ontop of TensorFlow and DL4J • Apache Singa • Caffe2 (Facebook)
  7. 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. Recommendations • Install CPU Version on CPU YARN Nodes • Install GPU Version on Nvidia (CUDA) • Do training on GPU YARN Nodes where possible • Apply Model on All Nodes and Trigger with Apache NiFi • What helps Hadoop and Spark will help TensorFlow. More RAM, More and Faster Cores, More Nodes. • Today, Run either pure TensorFlow with Keras <or> TensorFlow on Spark. • Try YARN 3.0 Containerized TensorFlow later in the year. • Consider Alluxio or Apache Ignite for in-memory optimization • Download the model zoos • Evaluate other Deep Learning Frameworks like MXNet and PyTorch
  8. 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Options
  9. 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow on Hadoop https://www.tensorflow.org/deploy/hadoop HDFS files can be used as a distributed source for input producers for training, allowing one fast cluster to Store these massive datasets and share them amongst your cluster. This requires setting a few environment variables: JAVA_HOME HADOOP_HDFS_HOME LD_LIBRARY_PATH CLASSPATH
  10. 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Serving on YARN 3.0 https://github.com/NVIDIA/nvidia-docker We use NVIDIA Docker containers on top of YARN
  11. 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Run TensorFlow on YARN 3.0 https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
  12. 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Flow Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  13. 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks Inc. 2011 Streaming Apache Deep Learning Page 13 Data Acquisition Edge Processing Deep Learning Real Time Stream Analytics Rapid Application Development IoT ANALYTICS CLOUD Acquire Move Routing & Filtering Deliver Parse Analysis Aggregation Modeling
  14. 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Components Streaming Analytics Manager Machine Learning Distributed queue Buffering Process decoupling Streaming and SQL Orchestration Queueing Simple Event Processing REST API Secure Spark Execution
  15. 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager Run everywhere Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Entity Resolution Natural Language Processing Apache Deep Learning Components
  16. 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. http://mxnet.incubator.apache.org/ • Cloud ready • Experienced team (XGBoost) • AWS, Microsoft, NVIDIA, Baidu, Intel backing • Apache Incubator Project • Run distributed on YARN • In my early tests, faster than TensorFlow. • Runs on Raspberry PI, Nvidia Jetson TX1 and other constrained devices • Great documentation • Gluon • Great Python Interaction • Model Server Available • ONNX Support • Now in Version 1.1! • Great Model Zoo https://mxnet.incubator.apache.org/how_to/cloud.html https://github.com/apache/incubator-mxnet/tree/1.1.0/example
  17. 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Architecture HDP Node X Node Manager Datanode HBase Region HDP Node Y Node Manager Datanode HBase Region HDF Node Apache NiFi Zookeeper Apache Spark MLib Apache Spark MLib GPU Node Neural Network Apache Spark MLib Apache Spark MLib Pipeline GPU Node Neural Network Pipeline MiNiFi Java Agent MiNiFi C++ Agent HDF Node Apache NiFi Zookeeper Apache Livy
  18. 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. What do we want to do? • MiniFi ingests camera images and sensor data • MiniFi executes Apache MXNet at the edge • Run Apache MXNet Inception to recognize objects in image • Apache NiFi stores images, metadata and enriched data in Hadoop • Apache NiFi ingests social data and REST feeds • Apache OpenNLP and Apache Tika for textual data
  19. 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. Aggregate all data from sensors, drones, logs, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to Apache HBase, Apache Hive, HDFS, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather, location, sentiment analysis, image analysis, object detection, image recognition, voice recognition with Apache Tika, Apache OpenNLP and Apache MXNet. Curate: Gain Insights
  20. 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a fifty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  21. 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. • Apache MXNet via Execute Process (Python) • Apache MXNet Running on Edge Nodes (MiniFi) S2S • Apache MXNet Model Server Integration (REST API) Not Covered Today • *Dockerized Apache MXNet on Hadoop YARN 3 with NVidia GPU • *Apache MXNet on Spark Apache NiFi Integration with Apache MXNet Options
  22. 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. • https://github.com/apache/incubator-mxnet/tree/master/tools/coreml • https://github.com/Leliana/WhatsThis • https://github.com/apache/incubator-mxnet/tree/master/amalgamation/jni • https://hub.docker.com/r/mxnet/ • https://github.com/apache/incubator-mxnet/tree/master/scala- package/spark Other Options
  23. 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Pre-Built Models • CaffeNet • SqueezeNet v1.1 • Inception v3 • Single Shot Detection (SSD) • VGG19 • ResidualNet 152 • LSTM http://mxnet.incubator.apache.org/model_zoo/index.html https://github.com/dmlc/mxnet-model-gallery
  24. 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. python3 -W ignore analyze.py {"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1": "n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2": "n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3": "n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt", "top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename": "images/tx1_image_img_20180208204131.jpg", "runtime": "2"} Apache MXNet via Python (OSX Local with WebCam)
  25. 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running on with Apache NiFi Node
  26. 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running on Edge Nodes (MiniFi) https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://github.com/tspannhw/mxnet_rpi https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1- running-apac.html
  27. 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Model Server with Apache NiFi https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html sudo pip3 install mxnet-model-server --upgrade
  28. 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running in Apache Zeppelin
  29. 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet on Apache YARN https://github.com/tspannhw/nifi-mxnet-yarn https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache- mxnet-on-apa.html dmlc-submit --cluster yarn --num-workers 1 --server-cores 2 --server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py
  30. 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. Apache OpenNLP for Entity Resolution Processor https://github.com/tspannhw/nifi-nlp- processor Requires installation of NAR and Apache OpenNLP Models (http://opennlp.sourceforge.net/models-1.5/). This is a non-supported processor that I wrote and put into the community. You can write one too! Apache OpenNLP with Apache NiFi https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
  31. 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Why TensorFlow? Also Apache MXNet, PyTorch and DL4J. • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  32. 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager Part of MiniFi C++ Agent Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Complex Event Processing Joining DataSets for Streaming Analytics Open Source Image Analytical Components Enabling Record Processing Schema Management
  33. 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) bazel-bin/tensorflow/examples/label_image/label_image -- image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186 TensorFlow via Python or C++ Binary
  34. 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' TensorFlow Python Example – Classify Image https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py currenttime= strftime("%Y-%m-%d %H:%M:%S",gmtime()) host = os.uname()[1]
  35. 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Python Classifier Launcher https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py #!/bin/bash DATE=$(date +"%Y-%m-%d_%H%M") fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg python2 -W ignore /opt/demo/classify_image.py /opt/demo/images/$DATE.jpg 2>/dev/null
  36. 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Python Example – Classify Image https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py row = [] for node_id in top_k: human_string = node_lookup.id_to_string(node_id) score = predictions[node_id] row.append( { 'node_id': node_id, 'image': image, 'host': host, 'ts': currenttime, 'human_string’: str(human_string), 'score': str(score)} ) json_string = json.dumps(row) print( json_string )
  37. 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. • TensorFlow (C++, Python, Java) via ExecuteStreamCommand • TensorFlow NiFi Java Custom Processor • TensorFlow Running on Edge Nodes (MiniFi) Apache NiFi Integration with TensorFlow Options
  38. 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in- apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor https://community.hortonworks.com/articles/178498/integrating-tensorflow- 16-image-labelling-with-hdf.html
  39. 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi Installation On A Single Node of Apache NiFi 1.5+ Download NAR here: https://github.com/tspannhw/nifi-tensorflow- processor/releases/tag/1.6 Install NAR file to /usr/hdf/current/nifi/lib/ Create a model directory (/opt/demo/models) wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi- tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow- processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true Restart Apache NiFi via Ambari
  40. 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi
  41. 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Running on Edge Nodes (MiniFi) CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image STRING, ts STRING, host STRING, score STRING, human_string STRING, node_id FLOAT) STORED AS ORC LOCATION '/tfimage'
  42. 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Installation (Edge) apt-get install curl wget –y wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-installer-linux-x86_64.sh ./bazel-0.11.1-installer-linux-x86_64.sh apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools python-h5py –y pip3 install six numpy wheel pip3 install --user numpy scipy matplotlib pandas sympy nose pip3 install --upgrade tensorflow git clone --recurse-submodules https://github.com/tensorflow/tensorflow wget http://mirror.jax.hugeserver.com/apache/nifi/minifi/0.4.0/minifi-0.4.0-bin.zip wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
  43. 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. Questions?
  44. 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. Contact https://github.com/tspannhw/ApacheDeepLearning101 https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn https://news.developer.nvidia.com/nvidias-2017-open-source-deep-learning-frameworks- contributions
  45. 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  46. 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved. Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!

×