Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton


Published on

We walk through a design of an open source Predictive Analytics Pipeline with MiniFi on a Raspberry Pi running Python and ingesting SenseHat sensor readings, a webcam image and running Inception classification on that image. MiniFi sends the resulting JSON and Image along with data provenance to an Apache NiFi server for preprocessing. It is then routed, converted, queried and stored as an Apache Hive table in Apache ORC format.

Apache NiFi, MiniFi, Apache Spark, HDP 3.0, HDF 3.1.2, Hortonworks Schema Registry Raspberry Pi with Intel Movidius running TensorFlow Python JSON Apache Avro Apache Hive HDFS

We also touched on integration with Blockchain, Ethereum and accessing REST API and Websocket interfaces offered by online blockchain explorers like Etherdelta and Etherscan.

This talk was June 28th, 2018 in Hamilton, NJ as part of the Future of Data Princeton and NJ Blockchain/Big Data joint meetup.

Published in: Data & Analytics
  • Be the first to comment

Open Source Predictive Analytics Pipeline with Apache NiFi and MiniFi Princeton

  1. 1. 1 ©HortonworksInc. 2011–2018. All rightsreserved. © Hortonworks, Inc. 2011-2018. All rights reserved. | Hortonworks confidential and proprietary information. Building a Predictive Analytics Pipeline using MiniFi and Apache NiFi for IoT Timothy Spann, Senior Solutions Engineer Hortonworks @PaaSDev
  2. 2. 2 ©HortonworksInc. 2011–2018. All rightsreserved. Disclaimer • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  3. 3. 3 ©HortonworksInc. 2011–2018. All rightsreserved. HDP 3.0 Hybrid Architecture
  4. 4. 4 ©HortonworksInc. 2011–2018. All rightsreserved. Storage Platform: HDFS in Apache Hadoop 3.1 Compute & GPU Platform: YARN in Apache Hadoop 3.1HBase2.0 Security & Governance: Atlas 1.0, Ranger 1.0, Knox 1.0 Hive 3.0 Spark 2.3Phoenix 0.8 Operations: Ambari 2.7 HDP 3.0 Our At-Rest Platform for Global Data Management
  5. 5. 5 ©HortonworksInc. 2011–2018. All rightsreserved. HDF Data-In-Motion Platform – with HDF 3.1.2
  6. 6. 6 ©HortonworksInc. 2011–2018. All rightsreserved. HORTONWORKS DATA FLOW NIFI 1.2.0HDF3.0 Jul 2017 1.0.0HDF2.0 Mar 2016 1.1.0 NiFiRegistry Ranger 0.7.0 0.5.0 0.6.0 Ambari 2.5.1 2.4.0 2.4.2 Kafka 0.9.0 0.10.0 Zookeeper 3.4.6 3.4.6 3.4.6 Storm 1.1.0 1.0.1 1.0.2SAM 0.5.0 SchemaRegistry 0.3.0 HDF2.1 Aug2016 Ongoing Innovation in Apache HDF1.0 Dec2014 0.3.0 0.6.1HDF1.2 Oct 2015 MiNiFiC++andJava 0.2.0 Ongoing Innovation in OpenSource 1.0.0 0.0.1 0.10.0 HDF 3.1.2 June 2018 1.5.0 0.1.0 SECURITYSTREAM ING & INTEGRATION OPERATIONS Hortonworks Data Flow 3.1.2
  7. 7. 7 ©HortonworksInc. 2011–2018. All rightsreserved. Data Science and In-Memory Performance Improvements Securely & seamlessly integrate with other services including Ranger & Atlas TensorFlow Tech Preview will complement GPU pooling to support deep learning use cases Spark testing with S3Guard to support cloud Spark/Hive integration to connect easily to the cloud
  8. 8. 8 ©HortonworksInc. 2011–2018. All rightsreserved. Enable TensorFlow TensorFlow training metrics & TensorFlow on YARN
  9. 9. 9 ©HortonworksInc. 2011–2018. All rightsreserved. Open Source Predictive Analytics Pipeline Ingestion Simple Event Processing Engine Stream Processing Destination Data Bus Build Predictive Model From Historical Data Deploy Predictive Model For Real-time Insights Perishable Insights Historical Insights Blockchain
  10. 10. 10 ©HortonworksInc. 2011–2018. All rightsreserved.
  11. 11. 11 ©HortonworksInc. 2011–2018. All rightsreserved. Open Source Components Streaming Analytics Manager Image Ingest Routing and Pre-Processing Orchestration Queueing Simple Event Processing Part of MiniFi C++ Agent Deep Learning Framework Spark ML Machine Learning Streaming
  12. 12. 12 ©HortonworksInc. 2011–2018. All rightsreserved. IIoT Multiple devices, protocols, frameworks, languages, data types, sensors and networks Protocols • MQTT • HTTPS / SSL (REST/JSON) • OPC UA • CoAP • AMQP • JSON • XML • CSV • Raw Text • Images (JPEG, PNG) • Raw Data Streams Data Types Sensors • Cameras • Temperature/Humidity • IR • Proximity • Motion Sensors • GPS Protocols • NVidia Jetson TX1 • Raspberry Pi • Arduino • TS-7800 V2 • ESP8266 • DragonBoard 410c • BeagleBone Black
  13. 13. 13 ©HortonworksInc. 2011–2018. All rightsreserved. Blockchain A blockchain is a continuously growing list of blocks, which are linked and secured using cryptography. Each block typically contains a cryptographic hash of the previous block, a timestamp, and transaction data.[7] By design, a blockchain is resistant to modification of the data. It is "an open, distributed ledger that can record transactions between two parties efficiently and in a verifiable and permanent way".[8] For use as a distributed ledger, a blockchain is typically managed by a peer-to-peer network collectively adhering to a protocol for inter-node communication and validating new blocks. Once recorded, the data in any given block cannot be altered retroactively without alteration of all subsequent blocks, which requires consensus of the network majority. Blockchains are secure by design and exemplify a distributed computing system with high Byzantine fault tolerance. Decentralized consensus has therefore been achieved with a blockchain.[9] This makes blockchains potentially suitable for the recording of events, medical records, and other records management activities, such as identity management, transaction processing, documenting provenance, food traceability, and voting. Blockchain was invented by Satoshi Nakamoto in 2008 to serve as the public transaction ledger of the cryptocurrency bitcoin. The invention of the blockchain for bitcoin made it the first digital currency to solve the double-spending problem without the need of a trusted authority or central server. The bitcoin design has inspired other applications.
  14. 14. 14 ©HortonworksInc. 2011–2018. All rightsreserved. Blockchain Blockchain ensures data objectivity—a single source of truth. Blockchain also represents a security layer that ensures that data is encrypted in such a way that only the people you want to can read your data. It makes it next to impossible for people to corrupt or manipulate the data—or even gain wrongful access to it—because the system raises an instant red flag when a problem occurs, and it uses a new, advanced encryption method to secure the data.
  15. 15. 15 ©HortonworksInc. 2011–2018. All rightsreserved. “Ethereum is a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference.”
  16. 16. 16 ©HortonworksInc. 2011–2018. All rightsreserved. Smart Contracts in Ethereum Allow parties to enter into agreements with no preexisting trust Guarantees that the transactions will run as specified in the contract The status of the contract and transactions can by verified at any time Openness No Middle Man Machine to Machine IIoT
  17. 17. 17 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow
  18. 18. 18 ©HortonworksInc. 2011–2018. All rightsreserved. • TensorFlow (C++, Python, Java) via ExecuteStreamCommand • TensorFlow NiFi Java Custom Processor • TensorFlow Running on Edge Nodes (MiniFi) Apache NiFi Integration with TensorFlow Options
  19. 19. 19 ©HortonworksInc. 2011–2018. All rightsreserved. python --image_file /opt/demo/dronedata/Bebop2_20160920083655-0400.jpg solar dish, solar collector, solar furnace (score = 0.98316) window screen (score = 0.00196) manhole cover (score = 0.00070) radiator (score = 0.00041) doormat, welcome mat (score = 0.00041) bazel-bin/tensorflow/examples/label_image/label_image -- image=/opt/demo/dronedata/Bebop2_20160920083655-0400.jpg tensorflow/examples/label_image/] solar dish (577): 0.983162I tensorflow/examples/label_image/] window screen (912): 0.00196204I tensorflow/examples/label_image/] manhole cover (763): 0.000704005I tensorflow/examples/label_image/] radiator (571): 0.000408321I tensorflow/examples/label_image/] doormat (972): 0.000406186 TensorFlow via Python or C++ Binary (Java Library Is New!)
  20. 20. 20 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow Python ExecuteStreamCommand NiFi
  21. 21. 21 ©HortonworksInc. 2011–2018. All rightsreserved. Run TensorFlow on YARN 3.1
  22. 22. 22 ©HortonworksInc. 2011–2018. All rightsreserved. Why TensorFlow? Also Apache MXNet, PyTorch and DL4J. • Google • Multiple platform support • Hadoop integration • Spark integration • Keras • Large Community • Python and Java APIs • GPU Support • Mobile Support • Inception v3 • Clustering • Fully functional demos • Open Source • Apache Licensed • Large Model Library • Buzz • Extensive Documentation • Raspberry Pi Support
  23. 23. 23 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow Java Processor in NiFi
  24. 24. 24 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow Running on Edge Nodes (MiniFi)
  25. 25. 25 ©HortonworksInc. 2011–2018. All rightsreserved. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a 200 sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  26. 26. 26 ©HortonworksInc. 2011–2018. All rightsreserved. Apache MiNiFi • NiFi lives in the data center. Give it an enterprise server or a cluster of them. • MiNiFi lives as close to where data is born and is a guest on that device or system “Let me get the key parts of NiFi close to where data begins and provide bidirectional data transfer"
  27. 27. 27 ©HortonworksInc. 2011–2018. All rightsreserved. Edge Intelligence with Apache MiNiFi à Guaranteed delivery à Data buffering ‒ Backpressure ‒ Pressure release à Prioritized queuing à Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance à Data provenance à Recovery / recording a rolling log of fine-grained history à Designed for extension Different from Apache NiFi à Design and Deploy à Warm re-deploys Key Features
  28. 28. 28 ©HortonworksInc. 2011–2018. All rightsreserved. Custom Apache NiFi Processors for Open Source Computer Vision
  29. 29. 29 ©HortonworksInc. 2011–2018. All rightsreserved. TensorFlow with MiniFi
  30. 30. 30 ©HortonworksInc. 2011–2018. All rightsreserved. Image Analytics
  31. 31. 31 ©HortonworksInc. 2011–2018. All rightsreserved. Thank you
  32. 32. 32 ©HortonworksInc. 2011–2018. All rightsreserved. Contact apache-ni feeds-from-etherscan-on-volume.html
  33. 33. 33 ©HortonworksInc. 2011–2018. All rightsreserved. Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  34. 34. 34 ©HortonworksInc. 2011–2018. All rightsreserved. Community Engagement Participate now at: ©HortonworksInc. 2011–2015. All RightsReserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!