Š Cloudera, Inc. All rights reserved.
Timothy Spann
Senior Solutions Engineer
tspann@cloudera.com
@PaasDev
Š Cloudera, Inc. All rights reserved.
Introduction
Tim Spann has been running meetups in Princeton on Big Data technologies since 2015.
Tim has spoken at several international conferences on Apache NiFi, Deep Learning and
Streaming.
https://community.hortonworks.com/users/9304/tspann.html
https://dzone.com/users/297029/bunkertor.html
https://www.meetup.com/futureofdata-princeton/
https://twitter.com/PaaSDev
https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni
https://conferences.oreilly.com/strata/strata-ny/public/schedule/speaker/185963
Š Cloudera, Inc. All rights reserved.
https://www.cloudera.com/about/events.html
Š Cloudera, Inc. All rights reserved.
Š Cloudera, Inc. All rights reserved.
Š Cloudera, Inc. All rights reserved. 6Š Cloudera, Inc. All rights reserved.
DATAFLOW USE CASES
Data Movement
Optimize resource utilization by moving data
between data centers or between on-premises
infrastructure and cloud infrastructure
Optimize Log Collection & Analysis
Optimize log analytics solutions such as Splunk by
using DataFlow as a single platform to collect and
deliver multiple data sources and using HDP for
lower cost storage options
Gain key insights with Streaming Analytics
Accelerate big data ROI by analyzing streaming data
for patterns, comparing with ML models and
delivering actionable intelligence
Single view / 360° view of customer
Ingest, transform and combine customer data
from multiple sources into a single data view /
lake
Stream Processing
Combine multiple streams of data in real-time,
enrich the data and route it to different end
points based on rules
Capture IoT Data
Ingest sensor data from IoT devices and stream
it for further processing and comprehensive
analysis
•
•
•
•
•
•
•
•
Š Cloudera, Inc. All rights reserved. 8Š Cloudera, Inc. All rights reserved.
MODERN DATA ARCHITECTURE
DATA CENTER
Machine
Learning/
Artificial
Intelligence
Telemetry –
Connected
Devices
Time Series
Databases
Stream Analytics
Deep Historical
Analysis
Exception
Monitoring
Legacy/
Operational
Data
Sensors,
Control
Systems
Cyber
Security
Edge
Analytics
Social Mobile
IoT
IoT
CLOUD
Geo Location
Š Cloudera, Inc. All rights reserved.
Š Cloudera, Inc. All rights reserved. 10
Š Cloudera, Inc. All rights reserved.
Blockchain
https://hortonworks.com/article/the-advantages-of-blockchain-technology/
Blockchain ensures data objectivity—a single source of truth. Blockchain also
represents a security layer that ensures that data is encrypted in such a way that only
the people you want to can read your data. It makes it next to impossible for people to
corrupt or manipulate the data—or even gain wrongful access to it—because the
system raises an instant red flag when a problem occurs, and it uses a new, advanced
encryption method to secure the data.
https://hortonworks.com/blog/blockchain-driven-data-marketplaces-reference-architecture/
https://hortonworks.com/blog/blockchain-driven-data-marketplaces-reference-architecture/
https://en.wikipedia.org/wiki/Blockchain
http://vision.cloudera.com/data-driven-innovation-in-healthcare-where-weve-been-and-where-were-headed/
Š Cloudera, Inc. All rights reserved.
 “Ethereum is a decentralized platform that runs smart contracts: applications
that run exactly as programmed without any possibility of downtime,
censorship, fraud or third party interference.”
https://community.hortonworks.com/articles/191255/ethereum-accessing-feeds-from-etherscan-on-volume.html
https://github.com/ethereum/wiki/wiki/White-Paper
https://community.hortonworks.com/content/kbentry/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu.
html
Š Cloudera, Inc. All rights reserved.
Blockchain Is Everywhere
Distributed DataStore https://bitdb.network/
https://ipfs.io/A peer-to-peer hypermedia
protocol
https://github.com/ipfs/java-ipfs-api
Š Cloudera, Inc. All rights reserved.
• https://community.hortonworks.com/articles/1995
70/ingest-btccom-and-blockchaincom-data-via-apa
che-ni.html
Blockchain Data Access
Š Cloudera, Inc. All rights reserved.
• https://community.hortonworks.com/articles/199566/ingesting-infura-rest-apis-to-access-the-ethereum.html
Ethereum Data Access
Š Cloudera, Inc. All rights reserved.
• https://community.hortonworks.com/articles/191255/ethereum-accessing-feeds-from-etherscan-on-volume.html
Ethereum Data Access
https://community.hortonworks.com/articles/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu.html

Blockchain and Apache NiFi

  • 1.
    Š Cloudera, Inc.All rights reserved. Timothy Spann Senior Solutions Engineer tspann@cloudera.com @PaasDev
  • 2.
    Š Cloudera, Inc.All rights reserved. Introduction Tim Spann has been running meetups in Princeton on Big Data technologies since 2015. Tim has spoken at several international conferences on Apache NiFi, Deep Learning and Streaming. https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://dzone.com/articles/integrating-keras-tensorflow-yolov3-into-apache-ni https://conferences.oreilly.com/strata/strata-ny/public/schedule/speaker/185963
  • 3.
    Š Cloudera, Inc.All rights reserved. https://www.cloudera.com/about/events.html
  • 4.
    Š Cloudera, Inc.All rights reserved.
  • 5.
    Š Cloudera, Inc.All rights reserved.
  • 6.
    Š Cloudera, Inc.All rights reserved. 6Š Cloudera, Inc. All rights reserved. DATAFLOW USE CASES Data Movement Optimize resource utilization by moving data between data centers or between on-premises infrastructure and cloud infrastructure Optimize Log Collection & Analysis Optimize log analytics solutions such as Splunk by using DataFlow as a single platform to collect and deliver multiple data sources and using HDP for lower cost storage options Gain key insights with Streaming Analytics Accelerate big data ROI by analyzing streaming data for patterns, comparing with ML models and delivering actionable intelligence Single view / 360° view of customer Ingest, transform and combine customer data from multiple sources into a single data view / lake Stream Processing Combine multiple streams of data in real-time, enrich the data and route it to different end points based on rules Capture IoT Data Ingest sensor data from IoT devices and stream it for further processing and comprehensive analysis
  • 7.
  • 8.
    © Cloudera, Inc.All rights reserved. 8© Cloudera, Inc. All rights reserved. MODERN DATA ARCHITECTURE DATA CENTER Machine Learning/ Artificial Intelligence Telemetry – Connected Devices Time Series Databases Stream Analytics Deep Historical Analysis Exception Monitoring Legacy/ Operational Data Sensors, Control Systems Cyber Security Edge Analytics Social Mobile IoT IoT CLOUD Geo Location
  • 9.
    Š Cloudera, Inc.All rights reserved.
  • 10.
    Š Cloudera, Inc.All rights reserved. 10
  • 11.
    © Cloudera, Inc.All rights reserved. Blockchain https://hortonworks.com/article/the-advantages-of-blockchain-technology/ Blockchain ensures data objectivity—a single source of truth. Blockchain also represents a security layer that ensures that data is encrypted in such a way that only the people you want to can read your data. It makes it next to impossible for people to corrupt or manipulate the data—or even gain wrongful access to it—because the system raises an instant red flag when a problem occurs, and it uses a new, advanced encryption method to secure the data. https://hortonworks.com/blog/blockchain-driven-data-marketplaces-reference-architecture/ https://hortonworks.com/blog/blockchain-driven-data-marketplaces-reference-architecture/ https://en.wikipedia.org/wiki/Blockchain http://vision.cloudera.com/data-driven-innovation-in-healthcare-where-weve-been-and-where-were-headed/
  • 12.
    © Cloudera, Inc.All rights reserved.  “Ethereum is a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference.” https://community.hortonworks.com/articles/191255/ethereum-accessing-feeds-from-etherscan-on-volume.html https://github.com/ethereum/wiki/wiki/White-Paper https://community.hortonworks.com/content/kbentry/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu. html
  • 13.
    Š Cloudera, Inc.All rights reserved. Blockchain Is Everywhere Distributed DataStore https://bitdb.network/ https://ipfs.io/A peer-to-peer hypermedia protocol https://github.com/ipfs/java-ipfs-api
  • 14.
    © Cloudera, Inc.All rights reserved. • https://community.hortonworks.com/articles/1995 70/ingest-btccom-and-blockchaincom-data-via-apa che-ni.html Blockchain Data Access
  • 15.
    © Cloudera, Inc.All rights reserved. • https://community.hortonworks.com/articles/199566/ingesting-infura-rest-apis-to-access-the-ethereum.html Ethereum Data Access
  • 16.
    © Cloudera, Inc.All rights reserved. • https://community.hortonworks.com/articles/191255/ethereum-accessing-feeds-from-etherscan-on-volume.html Ethereum Data Access https://community.hortonworks.com/articles/191146/accessing-feeds-from-etherdelta-on-trades-funds-bu.html