Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31

Timothy Spann
Timothy SpannDeveloper Advocate
1 @PaaSDev
Apache Deep Learning 101 v0.31
(For Data Engineers)
Timothy Spann
@PaaSDev
https://github.com/tspannhw/ApacheDeepLearning101/releases/tag/3.1
2 @PaaSDev
Disclaimer
• This is my personal integration and use of Apache software, no companies vision.
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
developed. This is Tim’s ideas only.
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
3 @PaaSDev
Agenda - Data Engineering With Apache Deep Learning
• Introduction – This is my personal workflow
• Architecture Overview
• Apache NiFi 1.7
• Apache MXNet 1.3
• Apache OpenNLP and Apache Tika
• Demos
• Questions
4 @PaaSDev
There are some who call him..
DZone Big Data MVB
Princeton Future of Data Meetup
Ex-Pivotal Senior Field Engineer
Current Hortonworks Senior Solutions Engineer
https://github.com/tspannhw
5 @PaaSDev
Deep Learning for Big Data Engineers
Multiple users, frameworks, languages, devices, data sources & clusters
BIG DATA ENGINEER
• Experience in ETL
• Coding skills in Scala,
Python, Java
• Experience with Apache
Hadoop
• Knowledge of database
query languages such as
SQL
• Knowledge of Hadoop tools
such as Hive, or Pig
• Expert in ETL (Eating, Ties
and Laziness)
• Social Media Maven
• Deep SME in Buzzwords
• No Coding Skills
• Interest in Pig and Falcon
CAT AI
• Will Drive your Car
• Will Fix Your Code
• Will Beat You At Q-Bert
• Will Not Be Discussed
Today
• Will Not Finish This Talk For
Me, This Time
http://gluon.mxnet.io/chapter01_crashcourse/preface.html
6 @PaaSDev
7 @PaaSDev
Apache Deep Learning Flow
8 @PaaSDev
Apache Deep Learning Server Architecture
Hadoop 3.1 Node
Node
Manager
Datanode
HBase
Region
Hadoop 3.1 Node
Node
Manager
Datanode
HBase
Region
Node
Apache NiFi
Zookeeper
Apache Spark 2.3
MLib
Apache Spark 2.3
MLib
Apache Hadoop 3.1 CPU Node
Neural Network
Apache Spark 2.3
MLib
Apache Spark 2.3
MLib
Pipeline
Apache Hadoop 3.1 GPU Node
Neural Network
Pipeline
MiNiFi Java
Agent
MiNiFi C++
Agent
HDF Node
Apache NiFi
Zookeeper
Apache Livy
9 @PaaSDev
Aggregate all the Data!
sensors, drones, logs,
geo-location devices
images from cameras,
results from running predictions on
pre-trained models.
Collect: Bring Together
10 @PaaSDev
Mediate point-to-point and
bi-directional data flows
delivering data reliably to and from
Apache HBase, Apache Hive, HDFS, Slack
and Email.
Conduct: Mediate the Data Flow
11 @PaaSDev
Orchestrate, parse, merge, aggregate,
filter, join, transform, fork, query, sort,
dissect, store, enrich with weather,
location, sentiment analysis, image
analysis, object detection, image
recognition
Curate: Gain Insights
12 @PaaSDev
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a sixty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
13 @PaaSDev
Edge Intelligence with Apache NiFi Subproject - MiNiFi
à Guaranteed delivery
à Data buffering
‒ Backpressure
‒ Pressure release
à Prioritized queuing
à Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
à Data provenance
à Recovery / recording a rolling log
of fine-grained history
à Designed for extension
Different from Apache NiFi
à Design and Deploy
à Warm re-deploys
Key Features
14 @PaaSDev
• Cloud ready
• Python, C++, Scala, R, Julia, Matlab, MXNet.js and Perl Support
• Experienced team (XGBoost)
• AWS, Microsoft, NVIDIA, Baidu, Intel
• Apache Incubator Project
• Run distributed on YARN and Spark
• In my early tests, faster than TensorFlow. (Try this your self)
• Runs on Raspberry PI, NVidia Jetson TX1 and other constrained devices
https://mxnet.incubator.apache.org/how_to/cloud.html
https://github.com/apache/incubator-mxnet/tree/1.3.0/example
https://gluon-cv.mxnet.io/api/model_zoo.html
15 @PaaSDev
• Great documentation
• Crash Course
• Gluon (Open API), GluonCV, GluonNLP
• Keras (One API Many Runtime Options)
• Great Python Interaction
• Model Server Available
• ONNX (Open Neural Network Exchange Format) Support for AI Models
• Now in Version 1.3
• Rich Model Zoo!
• TensorBoard compatible
http://mxnet.incubator.apache.org/ http://gluon.mxnet.io/https://onnx.ai/
pip3.6 install -U keras-mxnet
https://gluon-nlp.mxnet.io/
pip3.6 install --pre --upgrade mxnet pip3.6 install gluonnlp
16 @PaaSDev
• Apache MXNet via Execute Process (Python)
• Apache MXNet Running on Edge Nodes (MiniFi) S2S
• Apache MXNet Model Server Integration (REST API)
Apache NiFi Integration with Apache MXNet Options
https://community.hortonworks.com/articles/177232/apache-deep-learning-101-processing-apache-mxnet-m.html
https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html
https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html
17 @PaaSDev
• Apache MXNet Running in Apache Zeppelin Notebooks
• Apache MXNet Running on YARN 3.1 In Hadoop 3.1 In Dockerized Containers
• Apache MXNet Running on YARN
Apache NiFi Integration with Apache Hadoop Options
https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html
https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html
https://www.slideshare.net/Hadoop_Summit/deep-learning-on-yarn-running-distributed-tensorflow-etc-on-hadoop-cluster-v3
18 @PaaSDev
Apache MXNet Pre-Built Models - Model Zoo
• CaffeNet
• SqueezeNet v1.1
• Inception v3
• Single Shot Detection (SSD)
• VGG16
• VGG19
• ResidualNet 152
• LSTM
http://mxnet.incubator.apache.org/model_zoo/index.html
http://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html
19 @PaaSDev
Installing Apache MXNet on a Raspberry Pi
See: https://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html
I highly recommend using Python 3.6 and full custom build of OpenCV 3.x.
https://mxnet.incubator.apache.org/install/index.html?platform=Devices&language=Python&processor=CPU
20 @PaaSDev
python3 -W ignore analyze.py
{"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1":
"n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2":
"n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3":
"n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt",
"top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename":
"images/tx1_image_img_20180208204131.jpg", "runtime": "2"}
Apache MXNet via Python (OSX Local with WebCam)
https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-apache-
nifi-15-instance-w.html
21 @PaaSDev
Apache MXNet with Gluon and MQTT
https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html
https://community.hortonworks.com/articles/198912/ingesting-apache-mxnet-gluon-deep-learning-results.html
22 @PaaSDev
Apache MXNet 1.3 with GluonCV YOLO v3 and Apache NiFi
https://community.hortonworks.com/articles/222367/using-apache-nifi-with-apache-mxnet-gluoncv-for-yo.html
23 @PaaSDev
Apache MXNet Installation on OSX
git clone https://github.com/apache/incubator-mxnet.git
cd incubator-mxnet
mkdir images
curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0)
Gecko/20100101 Firefox/45.0' --header 'Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' -
-header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive'
'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L
tar -xvzf inception-bn.tar.gz
cp Inception-BN-0126.params Inception-BN-0000.params
brew install graphviz
pip install --upgrade pip
pip install --upgrade setuptools
pip install graphviz
pip install mxnet
http://mxnet.incubator.apache.org/install/index.html
24 @PaaSDev
Apache MXNet Running on an Apache NiFi Node
25 @PaaSDev
Apache MXNet Installation on an Centos 7
git clone https://github.com/apache/incubator-mxnet.git
sudo yum groupinstall 'Development Tools' -y
sudo yum install cmake git pkgconfig -y
sudo yum install libpng-devel libjpeg-turbo-devel jasper-devel openexr-devel
libtiff-devel libwebp-devel -y
sudo yum install libdc1394-devel libv4l-devel gstreamer-plugins-base-devel -y
sudo yum install gtk2-devel -y
sudo yum install tbb-devel eigen3-devel -y
pip install numpy
You will need a full Python development environment, C++ and I recommend
building OpenCV3.
https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html
26 @PaaSDev
Apache MXNet Running on Edge Nodes (MiniFi)
https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html
https://github.com/tspannhw/OpenSourceComputerVision
https://github.com/tspannhw/ApacheDeepLearning101
https://github.com/tspannhw/mxnet-for-iot
27 @PaaSDev
Multiple IoT Devices with Apache NiFi and Apache MXNet
https://community.hortonworks.com/articles/203638/ingesting-multiple-iot-devices-with-apache-nifi-17.html
28 @PaaSDev
Using Apache MXNet on The Edge with Sensors and Intel Movidius
(MiniFi)
https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html
https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
29 @PaaSDev
Apache MXNet Running in Apache Zeppelin
30 @PaaSDev
Apache MXNet Setup in Apache Zeppelin
Deep Learning Models
You will need to download the pre-built Inception models and
reference them on your server.
synset.txt
Inception-BN-0000.params
Inception-BN-symbol.json
See: https://mxnet.incubator.apache.org/tutorials/embedded/wine_d
etector.html
curl http://data.mxnet.io/models/imagenet/inception-bn.tar.gz >
inception-bn.tar.gz
curl http://data.mxnet.io/models/imagenet/synset.txt > synset.txt
31 @PaaSDev
Apache MXNet on Apache YARN 2.x Installation
git clone https://github.com/apache/incubator-mxnet.git
yum install java-1.8.0-openjdk
yum install java-1.8.0-openjdk-devel
pip2.7 install kubernetes
pip2.7 install opencv-python
git clone https://github.com/dmlc/dmlc-core.git
cd dmlc-core
make
cd tracker/yarn
./build.sh
export HADOOP_HOME=/usr/hdp/3.0.0.0-1634/hadoop
export HADOOP_HDFS_HOME=/usr/hdp/3.0.0.0-1634/hadoop-hdfs
export hdfs_home=/usr/hdp/3.0.0.0-1634/hadoop-hdfs
export hadoop_hdfs_home=/usr/hdp/3.0.0.0-1634/hadoop-hdfs
git clone https://github.com/tspannhw/ApacheDeepLearning101.git
git clone https://github.com/tspannhw/nifi-mxnet-yarn.git
32 @PaaSDev
Apache MXNet on Apache YARN 2.x with DMLC Script
https://github.com/tspannhw/nifi-mxnet-yarn
dmlc-submit --cluster yarn --num-workers 1 --server-cores 2
--server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py
Uses: Python 2.7
33 @PaaSDev
Apache MXNet on Apache YARN 3.1 Native No Spark
yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-
distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop-
yarn-applications-distributedshell.jar -shell_command python3.6 -
shell_args "/opt/demo/analyzex.py /opt/images/cat.jpg" -
container_resources memory-mb=512,vcores=1
Uses: Python Any
34 @PaaSDev
Apache MXNet on Apache YARN 3.1 Native No Spark
https://community.hortonworks.com/content/kbentry/222242/running-apache-mxnet-deep-learning-on-yarn-31-
hdp.html
https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
35 @PaaSDev
Apache MXNet on YARN 3.2 in Docker Using “Submarine”
https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine
yarn jar hadoop-yarn-applications-submarine-<version>.jar job run 
--name xyz-job-001 --docker_image <your docker image> 
--input_path hdfs://default/dataset/cifar-10-data 
--checkpoint_path hdfs://default/tmp/cifar-10-jobdir 
--num_workers 1 
--worker_resources memory=8G,vcores=2,gpu=2 
--worker_launch_cmd “cmd for MXNet/PyTorch"
Wangda Tan (wangda@apache.org)
Hadoop {Submarine} Project: Running deep learning workloads on YARN
https://issues.apache.org/jira/browse/YARN-8135
36 @PaaSDev
Apache MXNet Model Server with Apache NiFi
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
sudo pip3 install mxnet-model-server --upgrade
37 @PaaSDev
Apache MXNet Model Server with Apache NiFi
mxnet-model-server --models squeezenet=https://s3.amazonaws.com/model-
server/models/squeezenet_v1.1/squeezenet_v1.1.model --service
mms/model_service/mxnet_vision_service.py --port 9999
mxnet-model-server --models SSD=resnet50_ssd_model.model --service ssd_service.py --port 9998
https://community.hortonworks.com/articles/177232/apache-deep-learning-101-processing-apache-mxnet-m.html
38 @PaaSDev
Apache OpenNLP for Entity Resolution
Processor
https://github.com/tspannhw/nifi-nlp-
processor
Requires installation of NAR and Apache
OpenNLP Models
(http://opennlp.sourceforge.net/models-1.5/).
This is a non-supported processor that I wrote
and put into the community. You can write
one too!
Apache OpenNLP with Apache NiFi
https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html
https://opennlp.apache.org/news/release-190.html
39 @PaaSDev
Apache Tika with Apache NiFi
https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html
https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html
https://community.hortonworks.com/articles/76924/data-processing-pipeline-parsing-pdfs-and-identify.html
https://github.com/tspannhw/nifi-extracttext-processor
https://community.hortonworks.com/content/kbentry/177370/extracting-html-from-pdf-excel-and-word-
documents.html
40 @PaaSDev
Another Reason Apache and Apache Tika are Awesome!
https://community.hortonworks.com/articles/163776/parsing-
any-document-with-apache-nifi-15-with-apac.html
https://github.com/tspannhw/nifi-extracttext-processor
41 @PaaSDev
Contact
https://github.com/tspannhw/ApacheDeepLearning101
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html
https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn
https://unsplash.com/ https://pixabay.com/
http://gluon-crash-course.mxnet.io/
1 of 41

Recommended

Capital onehadoopintro by
Capital onehadoopintroCapital onehadoopintro
Capital onehadoopintroDoug Chang
2K views51 slides
Building hadoop based big data environment by
Building hadoop based big data environmentBuilding hadoop based big data environment
Building hadoop based big data environmentEvans Ye
2.8K views39 slides
開放原始碼 Ch1.2 intro - oss - apahce foundry (ver 2.0) by
開放原始碼 Ch1.2   intro - oss - apahce foundry (ver 2.0)開放原始碼 Ch1.2   intro - oss - apahce foundry (ver 2.0)
開放原始碼 Ch1.2 intro - oss - apahce foundry (ver 2.0)My own sweet home!
528 views38 slides
12-Step Program for Scaling Web Applications on PostgreSQL by
12-Step Program for Scaling Web Applications on PostgreSQL12-Step Program for Scaling Web Applications on PostgreSQL
12-Step Program for Scaling Web Applications on PostgreSQLKonstantin Gredeskoul
69.1K views101 slides
Leveraging docker for hadoop build automation and big data stack provisioning by
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
863 views63 slides
Pitfalls of migrating projects to JDK 9 by
Pitfalls of migrating projects to JDK 9Pitfalls of migrating projects to JDK 9
Pitfalls of migrating projects to JDK 9Pavel Bucek
3K views44 slides

More Related Content

What's hot

Reverse proxy magic by
Reverse proxy magicReverse proxy magic
Reverse proxy magicJim Jagielski
837 views48 slides
2017 DevSecCon ZAP Scripting Workshop by
2017 DevSecCon ZAP Scripting Workshop2017 DevSecCon ZAP Scripting Workshop
2017 DevSecCon ZAP Scripting WorkshopSimon Bennetts
3.6K views40 slides
Alfresco tuning part1 by
Alfresco tuning part1Alfresco tuning part1
Alfresco tuning part1Luis Cabaceira
10.7K views21 slides
Hacked? Pray that the Attacker used PowerShell by
Hacked? Pray that the Attacker used PowerShellHacked? Pray that the Attacker used PowerShell
Hacked? Pray that the Attacker used PowerShellNikhil Mittal
8.7K views43 slides
Scaling Instagram by
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
189.4K views185 slides
Springone2gx 2014 Reactive Streams and Reactor by
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorStéphane Maldini
5.7K views75 slides

What's hot(20)

2017 DevSecCon ZAP Scripting Workshop by Simon Bennetts
2017 DevSecCon ZAP Scripting Workshop2017 DevSecCon ZAP Scripting Workshop
2017 DevSecCon ZAP Scripting Workshop
Simon Bennetts3.6K views
Hacked? Pray that the Attacker used PowerShell by Nikhil Mittal
Hacked? Pray that the Attacker used PowerShellHacked? Pray that the Attacker used PowerShell
Hacked? Pray that the Attacker used PowerShell
Nikhil Mittal8.7K views
Scaling Instagram by iammutex
Scaling InstagramScaling Instagram
Scaling Instagram
iammutex189.4K views
Springone2gx 2014 Reactive Streams and Reactor by Stéphane Maldini
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
Stéphane Maldini5.7K views
Workshop: PowerShell for Penetration Testers by Nikhil Mittal
Workshop: PowerShell for Penetration TestersWorkshop: PowerShell for Penetration Testers
Workshop: PowerShell for Penetration Testers
Nikhil Mittal1.2K views
LJC: Fault tolerance with Apache Cassandra by Christopher Batey
LJC: Fault tolerance with Apache CassandraLJC: Fault tolerance with Apache Cassandra
LJC: Fault tolerance with Apache Cassandra
Christopher Batey2.4K views
Tool Academy: Web Archiving by nullhandle
Tool Academy: Web ArchivingTool Academy: Web Archiving
Tool Academy: Web Archiving
nullhandle2.3K views
Maven: from Scratch to Production (.pdf) by Johan Mynhardt
Maven: from Scratch to Production (.pdf)Maven: from Scratch to Production (.pdf)
Maven: from Scratch to Production (.pdf)
Johan Mynhardt1.2K views
PowerShell for Penetration Testers by Nikhil Mittal
PowerShell for Penetration TestersPowerShell for Penetration Testers
PowerShell for Penetration Testers
Nikhil Mittal36.2K views
Java 9 and the impact on Maven Projects (JavaOne 2016) by Robert Scholte
Java 9 and the impact on Maven Projects (JavaOne 2016)Java 9 and the impact on Maven Projects (JavaOne 2016)
Java 9 and the impact on Maven Projects (JavaOne 2016)
Robert Scholte7.4K views
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required" by Daniel Bryant
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
Daniel Bryant21.1K views
MongoDB training for java software engineers by Moshe Kaplan
MongoDB training for java software engineersMongoDB training for java software engineers
MongoDB training for java software engineers
Moshe Kaplan660 views
Emerging technologies /frameworks in Big Data by Rahul Jain
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain2.7K views
Web protocols for java developers by Pavel Bucek
Web protocols for java developersWeb protocols for java developers
Web protocols for java developers
Pavel Bucek582 views
Past, Present, and Future of Apache Storm by P. Taylor Goetz
Past, Present, and Future of Apache StormPast, Present, and Future of Apache Storm
Past, Present, and Future of Apache Storm
P. Taylor Goetz625 views

Similar to Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31

Apache Deep Learning 201 by
Apache Deep Learning 201Apache Deep Learning 201
Apache Deep Learning 201DataWorks Summit
773 views34 slides
Apache Deep Learning 201 - Philly Open Source by
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceTimothy Spann
642 views29 slides
Apache Deep Learning 201 - Barcelona DWS March 2019 by
Apache Deep Learning 201 - Barcelona DWS March 2019Apache Deep Learning 201 - Barcelona DWS March 2019
Apache Deep Learning 201 - Barcelona DWS March 2019Timothy Spann
645 views35 slides
Apache deep learning 202 Washington DC - DWS 2019 by
Apache deep learning 202   Washington DC - DWS 2019Apache deep learning 202   Washington DC - DWS 2019
Apache deep learning 202 Washington DC - DWS 2019Timothy Spann
552 views15 slides
Deep learning on HDP 2018 Prague by
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 PragueTimothy Spann
1.4K views46 slides
ApacheCon 2021 Apache Deep Learning 302 by
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
632 views23 slides

Similar to Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31(20)

Apache Deep Learning 201 - Philly Open Source by Timothy Spann
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann642 views
Apache Deep Learning 201 - Barcelona DWS March 2019 by Timothy Spann
Apache Deep Learning 201 - Barcelona DWS March 2019Apache Deep Learning 201 - Barcelona DWS March 2019
Apache Deep Learning 201 - Barcelona DWS March 2019
Timothy Spann645 views
Apache deep learning 202 Washington DC - DWS 2019 by Timothy Spann
Apache deep learning 202   Washington DC - DWS 2019Apache deep learning 202   Washington DC - DWS 2019
Apache deep learning 202 Washington DC - DWS 2019
Timothy Spann552 views
Deep learning on HDP 2018 Prague by Timothy Spann
Deep learning on HDP 2018 PragueDeep learning on HDP 2018 Prague
Deep learning on HDP 2018 Prague
Timothy Spann1.4K views
ApacheCon 2021 Apache Deep Learning 302 by Timothy Spann
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann632 views
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl... by OpenShift Origin
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
OpenShift Origin11.8K views
Openstack - An introduction/Installation - Presented at Dr Dobb's conference... by Rahul Krishna Upadhyaya
 Openstack - An introduction/Installation - Presented at Dr Dobb's conference... Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Openstack - An introduction/Installation - Presented at Dr Dobb's conference...
Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
ApacheCon 2021 - Apache NiFi Deep Dive 300 by Timothy Spann
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann690 views
Apache Deep Learning 101 - DWS Berlin 2018 by Timothy Spann
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
Timothy Spann1.2K views
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack by Alluxio, Inc.
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stackAccelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Accelerating analytics in the cloud with the Starburst Presto + Alluxio stack
Alluxio, Inc.260 views
Real time cloud native open source streaming of any data to apache solr by Timothy Spann
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann758 views
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ... by Accumulo Summit
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit1.4K views
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo by Joe Stein
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein3.1K views
Parquet and AVRO by airisData
Parquet and AVROParquet and AVRO
Parquet and AVRO
airisData8.9K views
MiniFi and Apache NiFi : IoT in Berlin Germany 2018 by Timothy Spann
MiniFi and Apache NiFi : IoT in Berlin Germany 2018MiniFi and Apache NiFi : IoT in Berlin Germany 2018
MiniFi and Apache NiFi : IoT in Berlin Germany 2018
Timothy Spann1.3K views
Trend Micro Big Data Platform and Apache Bigtop by Evans Ye
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
Evans Ye8.8K views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
114 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
106 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
96 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
158 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann114 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann106 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann96 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann158 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann30 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann314 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...DataScienceConferenc1
5 views11 slides
How Leaders See Data? (Level 1) by
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)Narendra Narendra
13 views76 slides
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptxDataScienceConferenc1
5 views12 slides
SUPER STORE SQL PROJECT.pptx by
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptxkhan888620
12 views16 slides
Binder1.pdf by
Binder1.pdfBinder1.pdf
Binder1.pdfEstherSita2
10 views21 slides
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented GenerationDataScienceConferenc1
11 views29 slides

Recently uploaded(20)

[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... by DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862012 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
Ukraine Infographic_22NOV2023_v2.pdf by AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K views
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9012 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0119 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
Organic Shopping in Google Analytics 4.pdf by GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials12 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views

Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31

  • 1. 1 @PaaSDev Apache Deep Learning 101 v0.31 (For Data Engineers) Timothy Spann @PaaSDev https://github.com/tspannhw/ApacheDeepLearning101/releases/tag/3.1
  • 2. 2 @PaaSDev Disclaimer • This is my personal integration and use of Apache software, no companies vision. • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. This is Tim’s ideas only. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  • 3. 3 @PaaSDev Agenda - Data Engineering With Apache Deep Learning • Introduction – This is my personal workflow • Architecture Overview • Apache NiFi 1.7 • Apache MXNet 1.3 • Apache OpenNLP and Apache Tika • Demos • Questions
  • 4. 4 @PaaSDev There are some who call him.. DZone Big Data MVB Princeton Future of Data Meetup Ex-Pivotal Senior Field Engineer Current Hortonworks Senior Solutions Engineer https://github.com/tspannhw
  • 5. 5 @PaaSDev Deep Learning for Big Data Engineers Multiple users, frameworks, languages, devices, data sources & clusters BIG DATA ENGINEER • Experience in ETL • Coding skills in Scala, Python, Java • Experience with Apache Hadoop • Knowledge of database query languages such as SQL • Knowledge of Hadoop tools such as Hive, or Pig • Expert in ETL (Eating, Ties and Laziness) • Social Media Maven • Deep SME in Buzzwords • No Coding Skills • Interest in Pig and Falcon CAT AI • Will Drive your Car • Will Fix Your Code • Will Beat You At Q-Bert • Will Not Be Discussed Today • Will Not Finish This Talk For Me, This Time http://gluon.mxnet.io/chapter01_crashcourse/preface.html
  • 7. 7 @PaaSDev Apache Deep Learning Flow
  • 8. 8 @PaaSDev Apache Deep Learning Server Architecture Hadoop 3.1 Node Node Manager Datanode HBase Region Hadoop 3.1 Node Node Manager Datanode HBase Region Node Apache NiFi Zookeeper Apache Spark 2.3 MLib Apache Spark 2.3 MLib Apache Hadoop 3.1 CPU Node Neural Network Apache Spark 2.3 MLib Apache Spark 2.3 MLib Pipeline Apache Hadoop 3.1 GPU Node Neural Network Pipeline MiNiFi Java Agent MiNiFi C++ Agent HDF Node Apache NiFi Zookeeper Apache Livy
  • 9. 9 @PaaSDev Aggregate all the Data! sensors, drones, logs, geo-location devices images from cameras, results from running predictions on pre-trained models. Collect: Bring Together
  • 10. 10 @PaaSDev Mediate point-to-point and bi-directional data flows delivering data reliably to and from Apache HBase, Apache Hive, HDFS, Slack and Email. Conduct: Mediate the Data Flow
  • 11. 11 @PaaSDev Orchestrate, parse, merge, aggregate, filter, join, transform, fork, query, sort, dissect, store, enrich with weather, location, sentiment analysis, image analysis, object detection, image recognition Curate: Gain Insights
  • 12. 12 @PaaSDev Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 13. 13 @PaaSDev Edge Intelligence with Apache NiFi Subproject - MiNiFi à Guaranteed delivery à Data buffering ‒ Backpressure ‒ Pressure release à Prioritized queuing à Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance à Data provenance à Recovery / recording a rolling log of fine-grained history à Designed for extension Different from Apache NiFi à Design and Deploy à Warm re-deploys Key Features
  • 14. 14 @PaaSDev • Cloud ready • Python, C++, Scala, R, Julia, Matlab, MXNet.js and Perl Support • Experienced team (XGBoost) • AWS, Microsoft, NVIDIA, Baidu, Intel • Apache Incubator Project • Run distributed on YARN and Spark • In my early tests, faster than TensorFlow. (Try this your self) • Runs on Raspberry PI, NVidia Jetson TX1 and other constrained devices https://mxnet.incubator.apache.org/how_to/cloud.html https://github.com/apache/incubator-mxnet/tree/1.3.0/example https://gluon-cv.mxnet.io/api/model_zoo.html
  • 15. 15 @PaaSDev • Great documentation • Crash Course • Gluon (Open API), GluonCV, GluonNLP • Keras (One API Many Runtime Options) • Great Python Interaction • Model Server Available • ONNX (Open Neural Network Exchange Format) Support for AI Models • Now in Version 1.3 • Rich Model Zoo! • TensorBoard compatible http://mxnet.incubator.apache.org/ http://gluon.mxnet.io/https://onnx.ai/ pip3.6 install -U keras-mxnet https://gluon-nlp.mxnet.io/ pip3.6 install --pre --upgrade mxnet pip3.6 install gluonnlp
  • 16. 16 @PaaSDev • Apache MXNet via Execute Process (Python) • Apache MXNet Running on Edge Nodes (MiniFi) S2S • Apache MXNet Model Server Integration (REST API) Apache NiFi Integration with Apache MXNet Options https://community.hortonworks.com/articles/177232/apache-deep-learning-101-processing-apache-mxnet-m.html https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html
  • 17. 17 @PaaSDev • Apache MXNet Running in Apache Zeppelin Notebooks • Apache MXNet Running on YARN 3.1 In Hadoop 3.1 In Dockerized Containers • Apache MXNet Running on YARN Apache NiFi Integration with Apache Hadoop Options https://community.hortonworks.com/articles/176789/apache-deep-learning-101-using-apache-mxnet-in-apa.html https://community.hortonworks.com/articles/174399/apache-deep-learning-101-using-apache-mxnet-on-apa.html https://www.slideshare.net/Hadoop_Summit/deep-learning-on-yarn-running-distributed-tensorflow-etc-on-hadoop-cluster-v3
  • 18. 18 @PaaSDev Apache MXNet Pre-Built Models - Model Zoo • CaffeNet • SqueezeNet v1.1 • Inception v3 • Single Shot Detection (SSD) • VGG16 • VGG19 • ResidualNet 152 • LSTM http://mxnet.incubator.apache.org/model_zoo/index.html http://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html
  • 19. 19 @PaaSDev Installing Apache MXNet on a Raspberry Pi See: https://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html I highly recommend using Python 3.6 and full custom build of OpenCV 3.x. https://mxnet.incubator.apache.org/install/index.html?platform=Devices&language=Python&processor=CPU
  • 20. 20 @PaaSDev python3 -W ignore analyze.py {"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1": "n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2": "n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3": "n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt", "top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename": "images/tx1_image_img_20180208204131.jpg", "runtime": "2"} Apache MXNet via Python (OSX Local with WebCam) https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-apache- nifi-15-instance-w.html
  • 21. 21 @PaaSDev Apache MXNet with Gluon and MQTT https://community.hortonworks.com/articles/198939/using-apache-mxnet-gluoncv-with-apache-nifi-for-de.html https://community.hortonworks.com/articles/198912/ingesting-apache-mxnet-gluon-deep-learning-results.html
  • 22. 22 @PaaSDev Apache MXNet 1.3 with GluonCV YOLO v3 and Apache NiFi https://community.hortonworks.com/articles/222367/using-apache-nifi-with-apache-mxnet-gluoncv-for-yo.html
  • 23. 23 @PaaSDev Apache MXNet Installation on OSX git clone https://github.com/apache/incubator-mxnet.git cd incubator-mxnet mkdir images curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' - -header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L tar -xvzf inception-bn.tar.gz cp Inception-BN-0126.params Inception-BN-0000.params brew install graphviz pip install --upgrade pip pip install --upgrade setuptools pip install graphviz pip install mxnet http://mxnet.incubator.apache.org/install/index.html
  • 24. 24 @PaaSDev Apache MXNet Running on an Apache NiFi Node
  • 25. 25 @PaaSDev Apache MXNet Installation on an Centos 7 git clone https://github.com/apache/incubator-mxnet.git sudo yum groupinstall 'Development Tools' -y sudo yum install cmake git pkgconfig -y sudo yum install libpng-devel libjpeg-turbo-devel jasper-devel openexr-devel libtiff-devel libwebp-devel -y sudo yum install libdc1394-devel libv4l-devel gstreamer-plugins-base-devel -y sudo yum install gtk2-devel -y sudo yum install tbb-devel eigen3-devel -y pip install numpy You will need a full Python development environment, C++ and I recommend building OpenCV3. https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html
  • 26. 26 @PaaSDev Apache MXNet Running on Edge Nodes (MiniFi) https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://github.com/tspannhw/OpenSourceComputerVision https://github.com/tspannhw/ApacheDeepLearning101 https://github.com/tspannhw/mxnet-for-iot
  • 27. 27 @PaaSDev Multiple IoT Devices with Apache NiFi and Apache MXNet https://community.hortonworks.com/articles/203638/ingesting-multiple-iot-devices-with-apache-nifi-17.html
  • 28. 28 @PaaSDev Using Apache MXNet on The Edge with Sensors and Intel Movidius (MiniFi) https://community.hortonworks.com/articles/176932/apache-deep-learning-101-using-apache-mxnet-on-the.html https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html
  • 29. 29 @PaaSDev Apache MXNet Running in Apache Zeppelin
  • 30. 30 @PaaSDev Apache MXNet Setup in Apache Zeppelin Deep Learning Models You will need to download the pre-built Inception models and reference them on your server. synset.txt Inception-BN-0000.params Inception-BN-symbol.json See: https://mxnet.incubator.apache.org/tutorials/embedded/wine_d etector.html curl http://data.mxnet.io/models/imagenet/inception-bn.tar.gz > inception-bn.tar.gz curl http://data.mxnet.io/models/imagenet/synset.txt > synset.txt
  • 31. 31 @PaaSDev Apache MXNet on Apache YARN 2.x Installation git clone https://github.com/apache/incubator-mxnet.git yum install java-1.8.0-openjdk yum install java-1.8.0-openjdk-devel pip2.7 install kubernetes pip2.7 install opencv-python git clone https://github.com/dmlc/dmlc-core.git cd dmlc-core make cd tracker/yarn ./build.sh export HADOOP_HOME=/usr/hdp/3.0.0.0-1634/hadoop export HADOOP_HDFS_HOME=/usr/hdp/3.0.0.0-1634/hadoop-hdfs export hdfs_home=/usr/hdp/3.0.0.0-1634/hadoop-hdfs export hadoop_hdfs_home=/usr/hdp/3.0.0.0-1634/hadoop-hdfs git clone https://github.com/tspannhw/ApacheDeepLearning101.git git clone https://github.com/tspannhw/nifi-mxnet-yarn.git
  • 32. 32 @PaaSDev Apache MXNet on Apache YARN 2.x with DMLC Script https://github.com/tspannhw/nifi-mxnet-yarn dmlc-submit --cluster yarn --num-workers 1 --server-cores 2 --server-memory 1G --log-level DEBUG --log-file mxnet.log analyzeyarn.py Uses: Python 2.7
  • 33. 33 @PaaSDev Apache MXNet on Apache YARN 3.1 Native No Spark yarn jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications- distributedshell.jar -jar /usr/hdp/current/hadoop-yarn-client/hadoop- yarn-applications-distributedshell.jar -shell_command python3.6 - shell_args "/opt/demo/analyzex.py /opt/images/cat.jpg" - container_resources memory-mb=512,vcores=1 Uses: Python Any
  • 34. 34 @PaaSDev Apache MXNet on Apache YARN 3.1 Native No Spark https://community.hortonworks.com/content/kbentry/222242/running-apache-mxnet-deep-learning-on-yarn-31- hdp.html https://github.com/tspannhw/ApacheDeepLearning101/blob/master/analyzehdfs.py
  • 35. 35 @PaaSDev Apache MXNet on YARN 3.2 in Docker Using “Submarine” https://github.com/apache/hadoop/tree/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-submarine yarn jar hadoop-yarn-applications-submarine-<version>.jar job run --name xyz-job-001 --docker_image <your docker image> --input_path hdfs://default/dataset/cifar-10-data --checkpoint_path hdfs://default/tmp/cifar-10-jobdir --num_workers 1 --worker_resources memory=8G,vcores=2,gpu=2 --worker_launch_cmd “cmd for MXNet/PyTorch" Wangda Tan (wangda@apache.org) Hadoop {Submarine} Project: Running deep learning workloads on YARN https://issues.apache.org/jira/browse/YARN-8135
  • 36. 36 @PaaSDev Apache MXNet Model Server with Apache NiFi https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html sudo pip3 install mxnet-model-server --upgrade
  • 37. 37 @PaaSDev Apache MXNet Model Server with Apache NiFi mxnet-model-server --models squeezenet=https://s3.amazonaws.com/model- server/models/squeezenet_v1.1/squeezenet_v1.1.model --service mms/model_service/mxnet_vision_service.py --port 9999 mxnet-model-server --models SSD=resnet50_ssd_model.model --service ssd_service.py --port 9998 https://community.hortonworks.com/articles/177232/apache-deep-learning-101-processing-apache-mxnet-m.html
  • 38. 38 @PaaSDev Apache OpenNLP for Entity Resolution Processor https://github.com/tspannhw/nifi-nlp- processor Requires installation of NAR and Apache OpenNLP Models (http://opennlp.sourceforge.net/models-1.5/). This is a non-supported processor that I wrote and put into the community. You can write one too! Apache OpenNLP with Apache NiFi https://community.hortonworks.com/articles/80418/open-nlp-example-apache-nifi-processor.html https://opennlp.apache.org/news/release-190.html
  • 39. 39 @PaaSDev Apache Tika with Apache NiFi https://community.hortonworks.com/articles/163776/parsing-any-document-with-apache-nifi-15-with-apac.html https://community.hortonworks.com/articles/81694/extracttext-nifi-custom-processor-powered-by-apach.html https://community.hortonworks.com/articles/76924/data-processing-pipeline-parsing-pdfs-and-identify.html https://github.com/tspannhw/nifi-extracttext-processor https://community.hortonworks.com/content/kbentry/177370/extracting-html-from-pdf-excel-and-word- documents.html
  • 40. 40 @PaaSDev Another Reason Apache and Apache Tika are Awesome! https://community.hortonworks.com/articles/163776/parsing- any-document-with-apache-nifi-15-with-apac.html https://github.com/tspannhw/nifi-extracttext-processor