Deep learning on HDP 2018 Prague

Timothy Spann
Timothy SpannDeveloper Advocate
1 © Hortonworks Inc. 2011–2018. All rights reserved.
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Deep Learning on HDP
Prague 2018
Timothy Spann, Solutions Engineer
Hortonworks @PaaSDev
2 © Hortonworks Inc. 2011–2018. All rights reserved.
Disclaimer
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
3 © Hortonworks Inc. 2011–2018. All rights reserved.
Agenda
• Data Engineering With Deep Learning
• TensorFlow with Apache NiFi
• TensorFlow on YARN
• Apache MXNet Pre-Built Models
• Apache MXNet Model Server With Apache NiFi
• Apache MXNet in Apache Zeppelin Notebooks
• Apache MXNet On YARN
• Demos
• Questions
4 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning for Big Data Engineers
Multiple users, frameworks, languages, data sources & clusters
BIG DATA ENGINEER
• Experience in ETL
• Coding skills in Scala,
Python, Java
• Experience with Apache
Hadoop
• Knowledge of database
query languages such as
SQL
• Knowledge of Hadoop tools
such as Hive, or Pig
• Expert in ETL (Eating, Ties
and Laziness)
• Social Media Maven
• Deep SME in Buzzwords
• No Coding Skills
• Interest in Pig and Falcon
CAT AI
• Will Drive your Car
• Will Fix Your Code
• Will Beat You At Q-Bert
• Will Not Be Discussed
Today
• Will Not Finish This Talk For
Me, This Time
http://gluon.mxnet.io/chapter01_crashcourse/preface.html
5 © Hortonworks Inc. 2011–2018. All rights reserved.
Use Cases
So Why Am I Orchestrating These Complex Deep Learning Workflows?
Computer Vision
• Object Recognition
• Image Classification
• Object Detection
• Motion Estimation
• Annotation
• Visual Question and Answer
• Autonomous Driving
• Speech to Text
• Speech Recognition
• Chat Bot
• Voice UI
Speech Recognition Natural Language Processing
• Sentiment Analysis
• Text Classification
• Named Entity Recognition
https://github.com/zackchase/mxnet-the-straight-dope
Recommender Systems
• Content-based
Recommendations
6 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning Options
• TensorFlow (C++, Python, Java)
• TensorFlow on Spark (Yahoo)
• Caffe on Spark (Yahoo)
• Apache MXNet (Baidu, Amazon, Nvidia, MS, CMU, NYU, intel)
• Deep Learning 4 J (Skymind) JVM
• PyTorch
• H2o Deep Water
• Keras ontop of TensorFlow and DL4J
• Apache Singa
• Caffe2 (Facebook)
7 © Hortonworks Inc. 2011–2018. All rights reserved.
Recommendations
• Install CPU Version on CPU YARN Nodes
• Install GPU Version on Nvidia (CUDA)
• Do training on GPU YARN Nodes where possible
• Apply Model on All Nodes and Trigger with Apache NiFi
• What helps Hadoop and Spark will help TensorFlow. More RAM, More and
Faster Cores, More Nodes.
• Today, Run either pure TensorFlow with Keras <or> TensorFlow on Spark.
• Try YARN 3.0 Containerized TensorFlow later in the year.
• Consider Alluxio or Apache Ignite for in-memory optimization
• Download the model zoos
• Evaluate other Deep Learning Frameworks like MXNet and PyTorch
8 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning Options
9 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow on Hadoop
https://www.tensorflow.org/deploy/hadoop
HDFS files can be used as a distributed source for input producers for training, allowing one fast cluster to
Store these massive datasets and share them amongst your cluster.
This requires setting a few environment variables:
JAVA_HOME
HADOOP_HDFS_HOME
LD_LIBRARY_PATH
CLASSPATH
10 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Serving on YARN 3.0 https://github.com/NVIDIA/nvidia-docker
We use NVIDIA Docker
containers on top of YARN
11 © Hortonworks Inc. 2011–2018. All rights reserved.
Run TensorFlow on YARN 3.0
https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
12 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Deep Learning Flow
Ingestion
Simple Event Processing
Engine
Stream Processing
Destination
Data Bus
Build
Predictive Model
From Historical Data
Deploy
Predictive Model
For Real-time Insights
Perishable Insights
Historical Insights
13 © Hortonworks Inc. 2011–2018. All rights reserved.
© Hortonworks Inc. 2011
Streaming Apache Deep Learning
Page 13
Data Acquisition
Edge Processing
Deep Learning
Real Time Stream Analytics
Rapid Application Development
IoT
ANALYTICS
CLOUD
Acquire Move
Routing
&
Filtering
Deliver Parse Analysis
Aggregation
Modeling
14 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache Deep Learning Components
Streaming Analytics
Manager
Machine Learning
Distributed queue
Buffering
Process decoupling
Streaming and SQL
Orchestration
Queueing
Simple Event Processing
REST API
Secure Spark Execution
15 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics
Manager
Run everywhere
Detect metadata and data
Extract metadata and data
Content Analysis
Deep Learning Framework
Entity Resolution
Natural Language Processing
Apache Deep Learning Components
16 © Hortonworks Inc. 2011–2018. All rights reserved.
http://mxnet.incubator.apache.org/
• Cloud ready
• Experienced team (XGBoost)
• AWS, Microsoft, NVIDIA, Baidu, Intel backing
• Apache Incubator Project
• Run distributed on YARN
• In my early tests, faster than TensorFlow.
• Runs on Raspberry PI, Nvidia Jetson TX1
and other constrained devices
• Great documentation
• Gluon
• Great Python Interaction
• Model Server Available
• ONNX Support
• Now in Version 1.1!
• Great Model Zoo
https://mxnet.incubator.apache.org/how_to/cloud.html
https://github.com/apache/incubator-mxnet/tree/1.1.0/example
17 © Hortonworks Inc. 2011–2018. All rights reserved.
Deep Learning Architecture
HDP Node X
Node
Manager
Datanode
HBase
Region
HDP Node Y
Node
Manager
Datanode
HBase
Region
HDF Node
Apache NiFi
Zookeeper
Apache Spark
MLib
Apache Spark
MLib
GPU Node
Neural Network
Apache Spark
MLib
Apache Spark
MLib
Pipeline
GPU Node
Neural Network
Pipeline
MiNiFi Java
Agent
MiNiFi C++
Agent
HDF Node
Apache NiFi
Zookeeper
Apache Livy
18 © Hortonworks Inc. 2011–2018. All rights reserved.
What do we want to do?
• MiniFi ingests camera images and
sensor data
• MiniFi executes Apache MXNet at the
edge
• Run Apache MXNet Inception to
recognize objects in image
• Apache NiFi stores images, metadata
and enriched data in Hadoop
• Apache NiFi ingests social data and
REST feeds
• Apache OpenNLP and Apache Tika for
textual data
19 © Hortonworks Inc. 2011–2018. All rights reserved.
Aggregate all data from sensors, drones, logs, geo-location devices,
machines and social feeds
Collect: Bring Together
Mediate point-to-point and bi-directional data flows, delivering data
reliably to Apache HBase, Apache Hive, HDFS, Slack and Email.
Conduct: Mediate the Data Flow
Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather,
location, sentiment analysis, image analysis, object detection, image
recognition, voice recognition with Apache Tika, Apache OpenNLP and
Apache MXNet.
Curate: Gain Insights
20 © Hortonworks Inc. 2011–2018. All rights reserved.
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a fifty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
21 © Hortonworks Inc. 2011–2018. All rights reserved.
• Apache MXNet via Execute Process (Python)
• Apache MXNet Running on Edge Nodes (MiniFi) S2S
• Apache MXNet Model Server Integration (REST API)
Not Covered Today
• *Dockerized Apache MXNet on Hadoop YARN 3 with NVidia GPU
• *Apache MXNet on Spark
Apache NiFi Integration with Apache MXNet Options
22 © Hortonworks Inc. 2011–2018. All rights reserved.
• https://github.com/apache/incubator-mxnet/tree/master/tools/coreml
• https://github.com/Leliana/WhatsThis
• https://github.com/apache/incubator-mxnet/tree/master/amalgamation/jni
• https://hub.docker.com/r/mxnet/
• https://github.com/apache/incubator-mxnet/tree/master/scala-
package/spark
Other Options
23 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Pre-Built Models
• CaffeNet
• SqueezeNet v1.1
• Inception v3
• Single Shot Detection (SSD)
• VGG19
• ResidualNet 152
• LSTM
http://mxnet.incubator.apache.org/model_zoo/index.html
https://github.com/dmlc/mxnet-model-gallery
24 © Hortonworks Inc. 2011–2018. All rights reserved.
python3 -W ignore analyze.py
{"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1":
"n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2":
"n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3":
"n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt",
"top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename":
"images/tx1_image_img_20180208204131.jpg", "runtime": "2"}
Apache MXNet via Python (OSX Local with WebCam)
25 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Running on with Apache NiFi Node
26 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Running on Edge Nodes (MiniFi)
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://github.com/tspannhw/mxnet_rpi
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-
running-apac.html
27 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Model Server with Apache NiFi
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
sudo pip3 install mxnet-model-server --upgrade
28 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet Running in Apache Zeppelin
29 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache MXNet on Apache YARN
https://github.com/tspannhw/nifi-mxnet-yarn
https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-
mxnet-on-apa.html
dmlc-submit --cluster yarn --num-workers 1 --server-cores 2
--server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py
30 © Hortonworks Inc. 2011–2018. All rights reserved.
Apache OpenNLP for Entity Resolution
Processor
https://github.com/tspannhw/nifi-nlp-
processor
Requires installation of NAR and Apache
OpenNLP Models
(http://opennlp.sourceforge.net/models-1.5/).
This is a non-supported processor that I wrote
and put into the community. You can write
one too!
Apache OpenNLP with Apache NiFi
https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
31 © Hortonworks Inc. 2011–2018. All rights reserved.
Why TensorFlow? Also Apache MXNet,
PyTorch and DL4J.
• Google
• Multiple platform
support
• Hadoop integration
• Spark integration
• Keras
• Large Community
• Python and Java APIs
• GPU Support
• Mobile Support
• Inception v3
• Clustering
• Fully functional demos
• Open Source
• Apache Licensed
• Large Model Library
• Buzz
• Extensive Documentation
• Raspberry Pi Support
32 © Hortonworks Inc. 2011–2018. All rights reserved.
Streaming Analytics
Manager
Part of MiniFi C++ Agent
Detect metadata and data
Extract metadata and data
Content Analysis
Deep Learning Framework
Complex Event Processing
Joining DataSets for Streaming Analytics
Open Source Image Analytical Components
Enabling Record Processing
Schema Management
33 © Hortonworks Inc. 2011–2018. All rights reserved.
python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
solar dish, solar collector, solar furnace (score = 0.98316)
window screen (score = 0.00196)
manhole cover (score = 0.00070)
radiator (score = 0.00041)
doormat, welcome mat (score = 0.00041)
bazel-bin/tensorflow/examples/label_image/label_image --
image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I
tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I
tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I
tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I
tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186
TensorFlow via Python or C++ Binary
34 © Hortonworks Inc. 2011–2018. All rights reserved.
DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz'
TensorFlow Python Example – Classify Image
https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py
currenttime= strftime("%Y-%m-%d %H:%M:%S",gmtime())
host = os.uname()[1]
35 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Python Classifier Launcher
https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py
#!/bin/bash
DATE=$(date +"%Y-%m-%d_%H%M")
fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg
python2 -W ignore /opt/demo/classify_image.py /opt/demo/images/$DATE.jpg 2>/dev/null
36 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Python Example – Classify Image
https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py
row = []
for node_id in top_k:
human_string = node_lookup.id_to_string(node_id)
score = predictions[node_id]
row.append( { 'node_id': node_id, 'image': image, 'host': host, 'ts': currenttime, 'human_string’:
str(human_string), 'score': str(score)} )
json_string = json.dumps(row)
print( json_string )
37 © Hortonworks Inc. 2011–2018. All rights reserved.
• TensorFlow (C++, Python, Java)
via ExecuteStreamCommand
• TensorFlow NiFi Java Custom Processor
• TensorFlow Running on Edge Nodes (MiniFi)
Apache NiFi Integration with TensorFlow Options
38 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Java Processor in NiFi
https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in-
apache-nifi-12-for.html
https://github.com/tspannhw/nifi-tensorflow-processor
https://community.hortonworks.com/articles/178498/integrating-tensorflow-
16-image-labelling-with-hdf.html
39 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Java Processor in NiFi
Installation On A Single Node of Apache NiFi 1.5+
Download NAR here: https://github.com/tspannhw/nifi-tensorflow-
processor/releases/tag/1.6
Install NAR file to /usr/hdf/current/nifi/lib/
Create a model directory (/opt/demo/models)
wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi-
tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt
wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow-
processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true
Restart Apache NiFi via Ambari
40 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Java Processor in NiFi
41 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Running on Edge Nodes (MiniFi)
CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image
STRING, ts STRING, host STRING, score STRING,
human_string STRING, node_id FLOAT) STORED AS ORC
LOCATION '/tfimage'
42 © Hortonworks Inc. 2011–2018. All rights reserved.
TensorFlow Installation (Edge)
apt-get install curl wget –y
wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-installer-linux-x86_64.sh
./bazel-0.11.1-installer-linux-x86_64.sh
apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools python-h5py –y
pip3 install six numpy wheel
pip3 install --user numpy scipy matplotlib pandas sympy nose
pip3 install --upgrade tensorflow
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
wget http://mirror.jax.hugeserver.com/apache/nifi/minifi/0.4.0/minifi-0.4.0-bin.zip
wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip
wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
43 © Hortonworks Inc. 2011–2018. All rights reserved.
Questions?
44 © Hortonworks Inc. 2011–2018. All rights reserved.
Contact
https://github.com/tspannhw/ApacheDeepLearning101
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn
https://news.developer.nvidia.com/nvidias-2017-open-source-deep-learning-frameworks-
contributions
45 © Hortonworks Inc. 2011–2018. All rights reserved.
Hortonworks Community Connection
Read access for everyone, join to participate and be recognized
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Samples and Repositories
46 © Hortonworks Inc. 2011–2018. All rights reserved.
Community Engagement
Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved
4,000+
Registered Users
10,000+
Answers
15,000+
Technical Assets
One Website!
1 of 46

Recommended

Apache MXNet for IoT with Apache NiFi by
Apache MXNet for IoT with Apache NiFiApache MXNet for IoT with Apache NiFi
Apache MXNet for IoT with Apache NiFiTimothy Spann
1.2K views23 slides
Apache Deep Learning 101 - DWS Berlin 2018 by
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Timothy Spann
1.2K views42 slides
MiniFi and Apache NiFi : IoT in Berlin Germany 2018 by
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018Timothy Spann
1.3K views34 slides
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018 by
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018
IoT Edge Processing with Apache NiFi and MiniFi and Apache MXNet for IoT NY 2018Timothy Spann
18.7K views32 slides
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton by
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi PrincetonOpen Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton
Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi PrincetonTimothy Spann
1.1K views34 slides
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python by
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, PythonOpen Computer Vision with OpenCV, Apache NiFi, TensorFlow, Python
Open Computer Vision with OpenCV, Apache NiFi, TensorFlow, PythonTimothy Spann
3.2K views48 slides

More Related Content

What's hot

Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方 by
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方HortonworksJapan
1.9K views43 slides
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac... by
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...DataWorks Summit
1.6K views26 slides
Introduction to Apache NiFi 1.11.4 by
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Timothy Spann
1K views32 slides
BYOP: Custom Processor Development with Apache NiFi by
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiDataWorks Summit
3.3K views65 slides
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善 by
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善HortonworksJapan
1.7K views57 slides
Apache Nifi Crash Course by
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash CourseDataWorks Summit
465 views32 slides

What's hot(20)

Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方 by HortonworksJapan
Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方Apache NiFi + Tensorflow + Hadoop:Big Data AI サンドイッチの作り方
Apache NiFi + Tensorflow + Hadoop: Big Data AI サンドイッチの作り方
HortonworksJapan1.9K views
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac... by DataWorks Summit
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
The Power of Intelligent Flows: Real-Time IoT Botnet Classification with Apac...
DataWorks Summit1.6K views
Introduction to Apache NiFi 1.11.4 by Timothy Spann
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann1K views
BYOP: Custom Processor Development with Apache NiFi by DataWorks Summit
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit3.3K views
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善 by HortonworksJapan
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
Hive 3.0 - HDPの最新バージョンで実現する新機能とパフォーマンス改善
HortonworksJapan1.7K views
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi by DataWorks Summit
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit596 views
Dataflow Management From Edge to Core with Apache NiFi by DataWorks Summit
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit708 views
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose by Aldrin Piri
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri2.3K views
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi by DataWorks Summit
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
DataWorks Summit1.4K views
Hadoop Operations - Past, Present, and Future by DataWorks Summit
Hadoop Operations - Past, Present, and FutureHadoop Operations - Past, Present, and Future
Hadoop Operations - Past, Present, and Future
DataWorks Summit201 views
Best practices and lessons learnt from Running Apache NiFi at Renault by DataWorks Summit
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit31.9K views
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi by DataWorks Summit
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit2.8K views

Similar to Deep learning on HDP 2018 Prague

Apache deep learning 101 by
Apache deep learning 101Apache deep learning 101
Apache deep learning 101DataWorks Summit
743 views29 slides
IoT with Apache MXNet and Apache NiFi and MiniFi by
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
1.8K views23 slides
Hands-On Deep Dive with MiniFi and Apache MXNet by
Hands-On Deep Dive with MiniFi and Apache MXNetHands-On Deep Dive with MiniFi and Apache MXNet
Hands-On Deep Dive with MiniFi and Apache MXNetTimothy Spann
770 views37 slides
Apache deep learning 202 Washington DC - DWS 2019 by
Apache deep learning 202   Washington DC - DWS 2019Apache deep learning 202   Washington DC - DWS 2019
Apache deep learning 202 Washington DC - DWS 2019Timothy Spann
552 views15 slides
Apache Deep Learning 201 - Barcelona DWS March 2019 by
Apache Deep Learning 201 - Barcelona DWS March 2019Apache Deep Learning 201 - Barcelona DWS March 2019
Apache Deep Learning 201 - Barcelona DWS March 2019Timothy Spann
645 views35 slides
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31 by
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Timothy Spann
736 views41 slides

Similar to Deep learning on HDP 2018 Prague(20)

IoT with Apache MXNet and Apache NiFi and MiniFi by DataWorks Summit
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
DataWorks Summit1.8K views
Hands-On Deep Dive with MiniFi and Apache MXNet by Timothy Spann
Hands-On Deep Dive with MiniFi and Apache MXNetHands-On Deep Dive with MiniFi and Apache MXNet
Hands-On Deep Dive with MiniFi and Apache MXNet
Timothy Spann770 views
Apache deep learning 202 Washington DC - DWS 2019 by Timothy Spann
Apache deep learning 202   Washington DC - DWS 2019Apache deep learning 202   Washington DC - DWS 2019
Apache deep learning 202 Washington DC - DWS 2019
Timothy Spann552 views
Apache Deep Learning 201 - Barcelona DWS March 2019 by Timothy Spann
Apache Deep Learning 201 - Barcelona DWS March 2019Apache Deep Learning 201 - Barcelona DWS March 2019
Apache Deep Learning 201 - Barcelona DWS March 2019
Timothy Spann645 views
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31 by Timothy Spann
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Timothy Spann736 views
HDF 3.1 : An Introduction to New Features by Timothy Spann
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann1.5K views
SoCal BigData Day by John Park
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
John Park588 views
Apache Deep Learning 201 - Philly Open Source by Timothy Spann
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann642 views
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin by Alex Zeltov
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov1.9K views
Introduction to the Hadoop EcoSystem by Shivaji Dutta
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
Shivaji Dutta1.8K views
Hadoop Present - Open Enterprise Hadoop by Yifeng Jiang
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
Yifeng Jiang4.3K views
Storm Demo Talk - Colorado Springs May 2015 by Mac Moore
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
Mac Moore307 views
Discover.hdp2.2.ambari.final[1] by Hortonworks
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
Hortonworks2.1K views
Classification based security in Hadoop by Madhan Neethiraj
Classification based security in HadoopClassification based security in Hadoop
Classification based security in Hadoop
Madhan Neethiraj543 views
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using... by Hortonworks
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks9.9K views
Hadoop Everywhere & Cloudbreak by Sean Roberts
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
Sean Roberts2K views
ApacheCon 2021 Apache Deep Learning 302 by Timothy Spann
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann632 views
Discover.hdp2.2.storm and kafka.final by Hortonworks
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
Hortonworks6.1K views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
115 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
106 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
97 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
158 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann115 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann106 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann97 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann158 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann315 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

Scaling Knowledge Graph Architectures with AI by
Scaling Knowledge Graph Architectures with AIScaling Knowledge Graph Architectures with AI
Scaling Knowledge Graph Architectures with AIEnterprise Knowledge
30 views15 slides
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
52 views8 slides
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Safe Software
263 views86 slides
Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
STPI OctaNE CoE Brochure.pdf by
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
14 views1 slide

Recently uploaded(20)

【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 views
STPI OctaNE CoE Brochure.pdf by madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 views
Transcript: The Details of Description Techniques tips and tangents on altern... by BookNet Canada
Transcript: The Details of Description Techniques tips and tangents on altern...Transcript: The Details of Description Techniques tips and tangents on altern...
Transcript: The Details of Description Techniques tips and tangents on altern...
BookNet Canada136 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman33 views
handbook for web 3 adoption.pdf by Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex22 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
The details of description: Techniques, tips, and tangents on alternative tex... by BookNet Canada
The details of description: Techniques, tips, and tangents on alternative tex...The details of description: Techniques, tips, and tangents on alternative tex...
The details of description: Techniques, tips, and tangents on alternative tex...
BookNet Canada127 views
HTTP headers that make your website go faster - devs.gent November 2023 by Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 views

Deep learning on HDP 2018 Prague

  • 1. 1 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Deep Learning on HDP Prague 2018 Timothy Spann, Solutions Engineer Hortonworks @PaaSDev
  • 2. 2 © Hortonworks Inc. 2011–2018. All rights reserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  • 3. 3 © Hortonworks Inc. 2011–2018. All rights reserved. Agenda • Data Engineering With Deep Learning • TensorFlow with Apache NiFi • TensorFlow on YARN • Apache MXNet Pre-Built Models • Apache MXNet Model Server With Apache NiFi • Apache MXNet in Apache Zeppelin Notebooks • Apache MXNet On YARN • Demos • Questions
  • 4. 4 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning for Big Data Engineers Multiple users, frameworks, languages, data sources & clusters BIG DATA ENGINEER • Experience in ETL • Coding skills in Scala, Python, Java • Experience with Apache Hadoop • Knowledge of database query languages such as SQL • Knowledge of Hadoop tools such as Hive, or Pig • Expert in ETL (Eating, Ties and Laziness) • Social Media Maven • Deep SME in Buzzwords • No Coding Skills • Interest in Pig and Falcon CAT AI • Will Drive your Car • Will Fix Your Code • Will Beat You At Q-Bert • Will Not Be Discussed Today • Will Not Finish This Talk For Me, This Time http://gluon.mxnet.io/chapter01_crashcourse/preface.html
  • 5. 5 © Hortonworks Inc. 2011–2018. All rights reserved. Use Cases So Why Am I Orchestrating These Complex Deep Learning Workflows? Computer Vision • Object Recognition • Image Classification • Object Detection • Motion Estimation • Annotation • Visual Question and Answer • Autonomous Driving • Speech to Text • Speech Recognition • Chat Bot • Voice UI Speech Recognition Natural Language Processing • Sentiment Analysis • Text Classification • Named Entity Recognition https://github.com/zackchase/mxnet-the-straight-dope Recommender Systems • Content-based Recommendations
  • 6. 6 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Options • TensorFlow (C++, Python, Java) • TensorFlow on Spark (Yahoo) • Caffe on Spark (Yahoo) • Apache MXNet (Baidu, Amazon, Nvidia, MS, CMU, NYU, intel) • Deep Learning 4 J (Skymind) JVM • PyTorch • H2o Deep Water • Keras ontop of TensorFlow and DL4J • Apache Singa • Caffe2 (Facebook)
  • 7. 7 © Hortonworks Inc. 2011–2018. All rights reserved. Recommendations • Install CPU Version on CPU YARN Nodes • Install GPU Version on Nvidia (CUDA) • Do training on GPU YARN Nodes where possible • Apply Model on All Nodes and Trigger with Apache NiFi • What helps Hadoop and Spark will help TensorFlow. More RAM, More and Faster Cores, More Nodes. • Today, Run either pure TensorFlow with Keras <or> TensorFlow on Spark. • Try YARN 3.0 Containerized TensorFlow later in the year. • Consider Alluxio or Apache Ignite for in-memory optimization • Download the model zoos • Evaluate other Deep Learning Frameworks like MXNet and PyTorch
  • 8. 8 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Options
  • 9. 9 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow on Hadoop https://www.tensorflow.org/deploy/hadoop HDFS files can be used as a distributed source for input producers for training, allowing one fast cluster to Store these massive datasets and share them amongst your cluster. This requires setting a few environment variables: JAVA_HOME HADOOP_HDFS_HOME LD_LIBRARY_PATH CLASSPATH
  • 10. 10 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Serving on YARN 3.0 https://github.com/NVIDIA/nvidia-docker We use NVIDIA Docker containers on top of YARN
  • 11. 11 © Hortonworks Inc. 2011–2018. All rights reserved. Run TensorFlow on YARN 3.0 https://community.hortonworks.com/articles/83872/data-lake-30-containerization-erasure-coding-gpu-p.html
  • 12. 12 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Flow Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights
  • 13. 13 © Hortonworks Inc. 2011–2018. All rights reserved. © Hortonworks Inc. 2011 Streaming Apache Deep Learning Page 13 Data Acquisition Edge Processing Deep Learning Real Time Stream Analytics Rapid Application Development IoT ANALYTICS CLOUD Acquire Move Routing & Filtering Deliver Parse Analysis Aggregation Modeling
  • 14. 14 © Hortonworks Inc. 2011–2018. All rights reserved. Apache Deep Learning Components Streaming Analytics Manager Machine Learning Distributed queue Buffering Process decoupling Streaming and SQL Orchestration Queueing Simple Event Processing REST API Secure Spark Execution
  • 15. 15 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager Run everywhere Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Entity Resolution Natural Language Processing Apache Deep Learning Components
  • 16. 16 © Hortonworks Inc. 2011–2018. All rights reserved. http://mxnet.incubator.apache.org/ • Cloud ready • Experienced team (XGBoost) • AWS, Microsoft, NVIDIA, Baidu, Intel backing • Apache Incubator Project • Run distributed on YARN • In my early tests, faster than TensorFlow. • Runs on Raspberry PI, Nvidia Jetson TX1 and other constrained devices • Great documentation • Gluon • Great Python Interaction • Model Server Available • ONNX Support • Now in Version 1.1! • Great Model Zoo https://mxnet.incubator.apache.org/how_to/cloud.html https://github.com/apache/incubator-mxnet/tree/1.1.0/example
  • 17. 17 © Hortonworks Inc. 2011–2018. All rights reserved. Deep Learning Architecture HDP Node X Node Manager Datanode HBase Region HDP Node Y Node Manager Datanode HBase Region HDF Node Apache NiFi Zookeeper Apache Spark MLib Apache Spark MLib GPU Node Neural Network Apache Spark MLib Apache Spark MLib Pipeline GPU Node Neural Network Pipeline MiNiFi Java Agent MiNiFi C++ Agent HDF Node Apache NiFi Zookeeper Apache Livy
  • 18. 18 © Hortonworks Inc. 2011–2018. All rights reserved. What do we want to do? • MiniFi ingests camera images and sensor data • MiniFi executes Apache MXNet at the edge • Run Apache MXNet Inception to recognize objects in image • Apache NiFi stores images, metadata and enriched data in Hadoop • Apache NiFi ingests social data and REST feeds • Apache OpenNLP and Apache Tika for textual data
  • 19. 19 © Hortonworks Inc. 2011–2018. All rights reserved. Aggregate all data from sensors, drones, logs, geo-location devices, machines and social feeds Collect: Bring Together Mediate point-to-point and bi-directional data flows, delivering data reliably to Apache HBase, Apache Hive, HDFS, Slack and Email. Conduct: Mediate the Data Flow Parse, filter, join, transform, fork, query, sort, dissect; enrich with weather, location, sentiment analysis, image analysis, object detection, image recognition, voice recognition with Apache Tika, Apache OpenNLP and Apache MXNet. Curate: Gain Insights
  • 20. 20 © Hortonworks Inc. 2011–2018. All rights reserved. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a fifty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 21. 21 © Hortonworks Inc. 2011–2018. All rights reserved. • Apache MXNet via Execute Process (Python) • Apache MXNet Running on Edge Nodes (MiniFi) S2S • Apache MXNet Model Server Integration (REST API) Not Covered Today • *Dockerized Apache MXNet on Hadoop YARN 3 with NVidia GPU • *Apache MXNet on Spark Apache NiFi Integration with Apache MXNet Options
  • 22. 22 © Hortonworks Inc. 2011–2018. All rights reserved. • https://github.com/apache/incubator-mxnet/tree/master/tools/coreml • https://github.com/Leliana/WhatsThis • https://github.com/apache/incubator-mxnet/tree/master/amalgamation/jni • https://hub.docker.com/r/mxnet/ • https://github.com/apache/incubator-mxnet/tree/master/scala- package/spark Other Options
  • 23. 23 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Pre-Built Models • CaffeNet • SqueezeNet v1.1 • Inception v3 • Single Shot Detection (SSD) • VGG19 • ResidualNet 152 • LSTM http://mxnet.incubator.apache.org/model_zoo/index.html https://github.com/dmlc/mxnet-model-gallery
  • 24. 24 © Hortonworks Inc. 2011–2018. All rights reserved. python3 -W ignore analyze.py {"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1": "n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2": "n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3": "n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt", "top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename": "images/tx1_image_img_20180208204131.jpg", "runtime": "2"} Apache MXNet via Python (OSX Local with WebCam)
  • 25. 25 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running on with Apache NiFi Node
  • 26. 26 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running on Edge Nodes (MiniFi) https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://github.com/tspannhw/mxnet_rpi https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1- running-apac.html
  • 27. 27 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Model Server with Apache NiFi https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html sudo pip3 install mxnet-model-server --upgrade
  • 28. 28 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet Running in Apache Zeppelin
  • 29. 29 © Hortonworks Inc. 2011–2018. All rights reserved. Apache MXNet on Apache YARN https://github.com/tspannhw/nifi-mxnet-yarn https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache- mxnet-on-apa.html dmlc-submit --cluster yarn --num-workers 1 --server-cores 2 --server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py
  • 30. 30 © Hortonworks Inc. 2011–2018. All rights reserved. Apache OpenNLP for Entity Resolution Processor https://github.com/tspannhw/nifi-nlp- processor Requires installation of NAR and Apache OpenNLP Models (http://opennlp.sourceforge.net/models-1.5/). This is a non-supported processor that I wrote and put into the community. You can write one too! Apache OpenNLP with Apache NiFi https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
  • 31. 31 © Hortonworks Inc. 2011–2018. All rights reserved. Why TensorFlow? Also Apache MXNet, PyTorch and DL4J. • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  • 32. 32 © Hortonworks Inc. 2011–2018. All rights reserved. Streaming Analytics Manager Part of MiniFi C++ Agent Detect metadata and data Extract metadata and data Content Analysis Deep Learning Framework Complex Event Processing Joining DataSets for Streaming Analytics Open Source Image Analytical Components Enabling Record Processing Schema Management
  • 33. 33 © Hortonworks Inc. 2011–2018. All rights reserved. python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) bazel-bin/tensorflow/examples/label_image/label_image -- image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186 TensorFlow via Python or C++ Binary
  • 34. 34 © Hortonworks Inc. 2011–2018. All rights reserved. DATA_URL = 'http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz' TensorFlow Python Example – Classify Image https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py currenttime= strftime("%Y-%m-%d %H:%M:%S",gmtime()) host = os.uname()[1]
  • 35. 35 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Python Classifier Launcher https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py #!/bin/bash DATE=$(date +"%Y-%m-%d_%H%M") fswebcam -q -r 1280x720 --no-banner /opt/demo/images/$DATE.jpg python2 -W ignore /opt/demo/classify_image.py /opt/demo/images/$DATE.jpg 2>/dev/null
  • 36. 36 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Python Example – Classify Image https://github.com/tspannhw/OpenSourceComputerVision/blob/master/classify_image.py row = [] for node_id in top_k: human_string = node_lookup.id_to_string(node_id) score = predictions[node_id] row.append( { 'node_id': node_id, 'image': image, 'host': host, 'ts': currenttime, 'human_string’: str(human_string), 'score': str(score)} ) json_string = json.dumps(row) print( json_string )
  • 37. 37 © Hortonworks Inc. 2011–2018. All rights reserved. • TensorFlow (C++, Python, Java) via ExecuteStreamCommand • TensorFlow NiFi Java Custom Processor • TensorFlow Running on Edge Nodes (MiniFi) Apache NiFi Integration with TensorFlow Options
  • 38. 38 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi https://community.hortonworks.com/content/kbentry/116803/building-a-custom-processor-in- apache-nifi-12-for.html https://github.com/tspannhw/nifi-tensorflow-processor https://community.hortonworks.com/articles/178498/integrating-tensorflow- 16-image-labelling-with-hdf.html
  • 39. 39 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi Installation On A Single Node of Apache NiFi 1.5+ Download NAR here: https://github.com/tspannhw/nifi-tensorflow- processor/releases/tag/1.6 Install NAR file to /usr/hdf/current/nifi/lib/ Create a model directory (/opt/demo/models) wget https://raw.githubusercontent.com/tspannhw/nifi-tensorflow-processor/master/nifi- tensorflow-processors/src/test/resources/models/imagenet_comp_graph_label_strings.txt wget https://github.com/tspannhw/nifi-tensorflow-processor/blob/master/nifi-tensorflow- processors/src/test/resources/models/tensorflow_inception_graph.pb?raw=true Restart Apache NiFi via Ambari
  • 40. 40 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Java Processor in NiFi
  • 41. 41 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Running on Edge Nodes (MiniFi) CREATE EXTERNAL TABLE IF NOT EXISTS tfimage (image STRING, ts STRING, host STRING, score STRING, human_string STRING, node_id FLOAT) STORED AS ORC LOCATION '/tfimage'
  • 42. 42 © Hortonworks Inc. 2011–2018. All rights reserved. TensorFlow Installation (Edge) apt-get install curl wget –y wget https://github.com/bazelbuild/bazel/releases/download/0.11.1/bazel-0.11.1-installer-linux-x86_64.sh ./bazel-0.11.1-installer-linux-x86_64.sh apt-get install libblas-dev liblapack-dev python-dev libatlas-base-dev gfortran python-setuptools python-h5py –y pip3 install six numpy wheel pip3 install --user numpy scipy matplotlib pandas sympy nose pip3 install --upgrade tensorflow git clone --recurse-submodules https://github.com/tensorflow/tensorflow wget http://mirror.jax.hugeserver.com/apache/nifi/minifi/0.4.0/minifi-0.4.0-bin.zip wget https://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip wget http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz
  • 43. 43 © Hortonworks Inc. 2011–2018. All rights reserved. Questions?
  • 44. 44 © Hortonworks Inc. 2011–2018. All rights reserved. Contact https://github.com/tspannhw/ApacheDeepLearning101 https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn https://news.developer.nvidia.com/nvidias-2017-open-source-deep-learning-frameworks- contributions
  • 45. 45 © Hortonworks Inc. 2011–2018. All rights reserved. Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  • 46. 46 © Hortonworks Inc. 2011–2018. All rights reserved. Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!