Successfully reported this slideshow.
Jun 29, 2018
We walk through a design of an open source Predictive Analytics Pipeline with MiniFi on a Raspberry Pi running Python and ingesting SenseHat sensor readings, a webcam image and running Inception classification on that image. MiniFi sends the resulting JSON and Image along with data provenance to an Apache NiFi server for preprocessing. It is then routed, converted, queried and stored as an Apache Hive table in Apache ORC format.
Apache NiFi, MiniFi, Apache Spark, HDP 3.0, HDF 3.1.2, Hortonworks Schema Registry Raspberry Pi with Intel Movidius running TensorFlow Python JSON Apache Avro Apache Hive HDFS
We also touched on integration with Blockchain, Ethereum and accessing REST API and Websocket interfaces offered by online blockchain explorers like Etherdelta and Etherscan.
This talk was June 28th, 2018 in Hamilton, NJ as part of the Future of Data Princeton and NJ Blockchain/Big Data joint meetup.
1 ©HortonworksInc. 2011–2018. All rightsreserved.
© Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information.
Building a Predictive Analytics Pipeline using MiniFi and
Apache NiFi for IoT
Timothy Spann, Senior Solutions Engineer
2 ©HortonworksInc. 2011–2018. All rightsreserved.
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be
• Technical feasibility, market demand, user feedback, and the Apache Software
Foundation community development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not
represent a contractual commitment, promise or obligation from Hortonworks to deliver
these features in any generally available product.
• Product features and technology directions are subject to change, and must not be
included in contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans,
customers should not rely upon it when making a purchase decision.
3 ©HortonworksInc. 2011–2018. All rightsreserved.
HDP 3.0 Hybrid Architecture
4 ©HortonworksInc. 2011–2018. All rightsreserved.
Storage Platform: HDFS in Apache Hadoop 3.1
Compute & GPU Platform: YARN in Apache
Security & Governance: Atlas 1.0, Ranger 1.0, Knox 1.0
Hive 3.0 Spark 2.3Phoenix 0.8
Operations: Ambari 2.7
HDP 3.0 Our At-Rest Platform for Global Data Management
5 ©HortonworksInc. 2011–2018. All rightsreserved.
HDF Data-In-Motion Platform – with HDF 3.1.2
6 ©HortonworksInc. 2011–2018. All rightsreserved.
HORTONWORKS DATA FLOW
Ongoing Innovation in Apache
Ongoing Innovation in OpenSource
June 2018 1.5.0 0.1.0 0.7.02.6.11.0.0 22.214.171.124.10.6.0 0.5.10.4.0
SECURITYSTREAM ING & INTEGRATION OPERATIONS
Hortonworks Data Flow 3.1.2
7 ©HortonworksInc. 2011–2018. All rightsreserved.
Data Science and In-Memory
Securely & seamlessly integrate with
other services including Ranger & Atlas
TensorFlow Tech Preview will complement GPU
pooling to support deep learning use cases
Spark testing with S3Guard to support cloud
Spark/Hive integration to connect easily
to the cloud
8 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow training metrics &
TensorFlow on YARN
9 ©HortonworksInc. 2011–2018. All rightsreserved.
Open Source Predictive Analytics Pipeline
Simple Event Processing
From Historical Data
For Real-time Insights
10 ©HortonworksInc. 2011–2018. All rightsreserved.
11 ©HortonworksInc. 2011–2018. All rightsreserved.
Open Source Components
Routing and Pre-Processing
Simple Event Processing
Part of MiniFi C++ Agent
Deep Learning Framework
Spark ML Machine Learning
12 ©HortonworksInc. 2011–2018. All rightsreserved.
Multiple devices, protocols, frameworks, languages, data types, sensors and networks
• HTTPS / SSL (REST/JSON)
• OPC UA
• Raw Text
• Images (JPEG, PNG)
• Raw Data Streams
Data Types Sensors
• Motion Sensors
• NVidia Jetson TX1
• Raspberry Pi
• TS-7800 V2
• DragonBoard 410c
• BeagleBone Black
13 ©HortonworksInc. 2011–2018. All rightsreserved.
A blockchain is a continuously growing list of blocks, which are linked and secured using cryptography. Each block
typically contains a cryptographic hash of the previous block, a timestamp, and transaction data. By design, a
blockchain is resistant to modification of the data. It is "an open, distributed ledger that can record transactions
between two parties efficiently and in a verifiable and permanent way". For use as a distributed ledger, a
blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for inter-node
communication and validating new blocks. Once recorded, the data in any given block cannot be altered
retroactively without alteration of all subsequent blocks, which requires consensus of the network majority.
Blockchains are secure by design and exemplify a distributed computing system with high Byzantine fault
tolerance. Decentralized consensus has therefore been achieved with a blockchain. This makes blockchains
potentially suitable for the recording of events, medical records, and other records management activities, such
as identity management, transaction processing, documenting provenance, food traceability, and voting.
Blockchain was invented by Satoshi Nakamoto in 2008 to serve as the public transaction ledger of
the cryptocurrency bitcoin. The invention of the blockchain for bitcoin made it the first digital currency to solve
the double-spending problem without the need of a trusted authority or central server. The bitcoin design has
inspired other applications.
14 ©HortonworksInc. 2011–2018. All rightsreserved.
Blockchain ensures data objectivity—a single source of truth. Blockchain also represents a security layer
that ensures that data is encrypted in such a way that only the people you want to can read your data. It
makes it next to impossible for people to corrupt or manipulate the data—or even gain wrongful access to
it—because the system raises an instant red flag when a problem occurs, and it uses a new, advanced
encryption method to secure the data.
15 ©HortonworksInc. 2011–2018. All rightsreserved.
“Ethereum is a decentralized platform that runs smart contracts: applications that run
exactly as programmed without any possibility of downtime, censorship, fraud or third
16 ©HortonworksInc. 2011–2018. All rightsreserved.
Smart Contracts in Ethereum
Allow parties to enter into agreements with no preexisting trust
Guarantees that the transactions will run as specified in the contract
The status of the contract and transactions can by verified at any time
No Middle Man
Machine to Machine IIoT
17 ©HortonworksInc. 2011–2018. All rightsreserved.
18 ©HortonworksInc. 2011–2018. All rightsreserved.
• TensorFlow (C++, Python, Java)
• TensorFlow NiFi Java Custom Processor
• TensorFlow Running on Edge Nodes (MiniFi)
Apache NiFi Integration with TensorFlow Options
19 ©HortonworksInc. 2011–2018. All rightsreserved.
python classify_image.py --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg
solar dish, solar collector, solar furnace (score = 0.98316)
window screen (score = 0.00196)
manhole cover (score = 0.00070)
radiator (score = 0.00041)
doormat, welcome mat (score = 0.00041)
tensorflow/examples/label_image/main.cc:204] solar dish (577): 0.983162I
tensorflow/examples/label_image/main.cc:204] window screen (912): 0.00196204I
tensorflow/examples/label_image/main.cc:204] manhole cover (763): 0.000704005I
tensorflow/examples/label_image/main.cc:204] radiator (571): 0.000408321I
tensorflow/examples/label_image/main.cc:204] doormat (972): 0.000406186
TensorFlow via Python or C++ Binary (Java Library Is New!)
20 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow Python ExecuteStreamCommand NiFi
21 ©HortonworksInc. 2011–2018. All rightsreserved.
Run TensorFlow on YARN 3.1
22 ©HortonworksInc. 2011–2018. All rightsreserved.
Why TensorFlow? Also Apache MXNet, PyTorch and DL4J.
• Multiple platform
• Hadoop integration
• Spark integration
• Large Community
• Python and Java APIs
• GPU Support
• Mobile Support
• Inception v3
• Fully functional demos
• Open Source
• Apache Licensed
• Large Model Library
• Extensive Documentation
• Raspberry Pi Support
23 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow Java Processor in NiFi
24 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow Running on Edge Nodes (MiniFi)
25 ©HortonworksInc. 2011–2018. All rightsreserved.
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
• Hundreds of processors
• Visual command and
• Over a 200 sources
• Flow templates
• Designed for extension
• Version Control
26 ©HortonworksInc. 2011–2018. All rightsreserved.
• NiFi lives in the data center. Give it an
enterprise server or a cluster of them.
• MiNiFi lives as close to where data is born
and is a guest on that device or system
“Let me get the key parts of NiFi close to where data begins and provide
bidirectional data transfer"
27 ©HortonworksInc. 2011–2018. All rightsreserved.
Edge Intelligence with Apache MiNiFi
Ã Guaranteed delivery
Ã Data buffering
‒ Pressure release
Ã Prioritized queuing
Ã Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
Ã Data provenance
Ã Recovery / recording a rolling log
of fine-grained history
Ã Designed for extension
Different from Apache NiFi
Ã Design and Deploy
Ã Warm re-deploys
28 ©HortonworksInc. 2011–2018. All rightsreserved.
Custom Apache NiFi Processors for Open Source Computer Vision
29 ©HortonworksInc. 2011–2018. All rightsreserved.
TensorFlow with MiniFi
30 ©HortonworksInc. 2011–2018. All rightsreserved.
31 ©HortonworksInc. 2011–2018. All rightsreserved.
32 ©HortonworksInc. 2011–2018. All rightsreserved.
33 ©HortonworksInc. 2011–2018. All rightsreserved.
Hortonworks Community Connection
Read access for everyone, join to participate and be recognized
• Full Q&A Platform (like StackOverflow)
• Knowledge Base Articles
• Code Samples and Repositories
34 ©HortonworksInc. 2011–2018. All rightsreserved.
Participate now at: community.hortonworks.com
©HortonworksInc. 2011–2015. All RightsReserved