SlideShare a Scribd company logo
1 of 32
1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Hortonworks Data Flow
Wrangling the Internet of Things
Pat Alwell – Solutions Engineer
Big Data Day Los Angeles ,CA
August 2018
2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
whoami
Pat Alwell
Solutions Engineer Hortonworks
AWS | Spark | Hadoop Admin
Career Started at Algebraix Data
Connect with me:
GitHub  https://github.com/patalwell
Email  palwell@hortonworks.com
Goals for the Session
• Demonstrate how organizations can leverage
Hortonworks Dataflow (HDF) to wrangle the Internet
of Things
3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache NiFi Managed Dataflow
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Nothing in HDF Makes Sense Except in Light
of Flow Based Programming
Flow
Management
Administration
HDF
Streaming
Flow-based programming is an
abstraction of information packets,
algorithmic transformations, and a
common set of connections. The flow of
data is essentially equivocal to a
production line. Raw material is pushed
or pulled into a process and transformed
to meet an end goal.
5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• Connection = Route between processors
 Queues that can be dynamically prioritized
• Process Group = Logical Group of processors and their connections
 Receive data via input ports, send data via output ports
 FlowFile = Unit of data moving through the system
 Content + Attributes (Metadata)
 Processor = Process data
 Transforms/Writes FlowFiles
 Creates Provenance
Nifi Terminology
6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What is HDF?
7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Flow
Management
HDF
How Can we Manage Flows with
HDF?
Apache NiFi supports powerful and scalable directed
graphs of data routing, transformation, and system
mediation logic.
Apache MiNiFi is a complementary data collection
approach that supplements the core tenets of NiFi in
dataflow management, focusing on the collection of data
at the source of its creation.
8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Visual Command and Control A Convenient Graphical User Interface that
supports Flow Based Programming
• Drag and drop processors to build a flow
• Start, stop, and configure components in real time
• View errors and corresponding error messages
• View statistics and health of data flow
• Create templates of common processor & connections
9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Visual Command and Control
Over 200 + Processors designed to help you
capture and deliver data to and from common
sources
Examples Include:
-Capturing Logs from Mobile Devices and
Sensors; formatting said logs with Regex,
pushing said logs into HDFS or S3
-Collecting sensor readings from GPIO headers
and delivering the information to an application
via Kafka
-Customer sentiment analysis by joining social
media data to customer information within Hive
or Phoenix
10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
220+ Processors for Deeper Ecosystem Integration
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
HTTP
Syslog
Email
HTML
Image
HL7
FTP
UDP
XML
SFTP
AMQP
WebSocket
11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Provenance and Lineage
12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Payload Prioritization
• Configure a prioritizer per
connection
• Determine what is important for
your data – time based, arrival
order, importance of a data set
• Funnel many connections down to
a single connection to prioritize
across data sets
• Develop your own prioritizer if
needed
13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Back-Pressure
14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Latency vs. Throughput
15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• Java
• < 40MB binary distribution
• Requires JRE 1.8
• More feature complete
• Targeted for any system that can run a JVM
• Supported Processors: https://github.com/apache/nifi-
minifi/blob/6ddf8bb0ee3614320a53ce7f2e0b3950ee4d9c5f
/minifi-docs/src/main/markdown/minifi-java-agent-quick-
start.md
• C++
• Dynamic heap of ~1MB based on use-case
• Targeted for resource constrained environments
• Supported Processors: https://github.com/apache/nifi-
minifi-cpp/blob/master/PROCESSORS.md
Minifi’s Key Features
An Embedded Extension that supports Flow
Based Programming on the Edge
Agents Provide:
• Small and lightweight footprint
• Central management of agents
• Generation of data provenance
• Integration with NiFi for follow-on
dataflow management and full chain of
custody of information
16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
How can we Administer and Secure flow
activity?
Administration
HDF
The Apache Ambari project is aimed at making Hadoop
management simpler by developing software for
provisioning, managing, and monitoring Apache Hadoop
clusters. Ambari provides an intuitive, easy-to-use
Hadoop management web UI backed by its RESTful
APIs. - https://ambari.apache.org/
17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Cluster Administration and Role Based Security
18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• NiFi Registry - sub-project of Apache NiFi
• https://github.com/apache/nifi-registry
• https://issues.apache.org/jira/projects/NIFIREG
• Complimentary application, central location for
storage/management of “versioned” resources
• Initial capability to store and retrieve “versioned
flows”
• Integration on NiFi side
• Start/Stop version control of a process group
• Change version (upgrade/downgrade)
• Import new process group from a version
Version Control for Flows
19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
• Parameterize configuration like connection
strings, file paths, etc.
• Referenced via Expression language
• Kafka Brokers = ${kafka.brokers}
• Variables associated with a process group
• Right-click on canvas to view variables for
current process group
• Hierarchical order of precedence, resolve
closest reference to component
• Editing variables automatically restarts
any components referencing the variables!
Level 1
Level 2
Vars
Vars
Variable Registry Flow Ubiquity
20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
⬢ Data Governance
– Centralized registry to provide reusable
schema
– Version management to define
relationship between schemas
– Validation to enable generic format
conversion and generic routing
⬢ Operational Efficiency
– Centralized registry to avoid attaching
schema to every piece of data
– Version management to enable
consumers and producers can evolve at
different rates
– Validation to ensure data quality
Schema Registry
21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
How Can we take advantage of Streaming
computations?
HDF
Streaming
22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
A Distributed Streaming Platform that supports
Pub-Sub Systems
• Publish and Subscribe to streams of records,
similar to a messaging queue
• Store streams of records in a fault-tolerant way
• Process Streams of records as they occur
• Topic is a partitioned Log of events
Generally used to…
• Build real-time streaming data pipelines to
reliably transfer data between systems
• Build real-time streaming applications that
transform or react to the streams of data
What is Kafka?
23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Streaming Messaging Manager
(*NEW)
“Kafka Blindness” – Customers who use Kafka today struggle with
monitoring and managing Kafka clusters.
24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What is Storm?
A Distributed Fault Tolerant Service that
processes Streams of Data• Capture data from external systems (Kafka, Hbase, Hive, HDFS, and AWS
Kinesis) Spout
• Transform and aggregate said data using filter, map, flatMap, aggregate,
reduce, count, etc. Bolt
• Write data back to an external system for storage or visualization (Hbase,
Hive, Druid) Bolt
• The chain is known as a topology. The topology is run under a master
slave type architecture.
Generally used to…
• Processing streams
• No need for intermediate queues. Continuous computation Send
data to clients continuously so they can update and show results in
real time, such as site metrics.
• Distributed remote procedure call
• Easily parallelize CPU-intensive operations.
*
* Leibiusky, Jonathan. Getting Started with Storm:
Continuous Streaming Computation with Twitter's
Cluster Technology (p. 1). O'Reilly Media. Kindle
Edition.
25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
What does our flow look
like?
26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
⬢ Full Q&A Platform (like StackOverflow)
⬢ Knowledge Base Articles
⬢ Code Gallery and Samples
⬢ https://community.hortonworks.com
Join Us: Hortonworks Community Connection
27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
⬢ Download our HDF Sandbox for Docker, VMWare, or
VirtualBox:
https://hortonworks.com/downloads/#sandbox
⬢ Follow our HDF tutorials:
https://hortonworks.com/tutorial/analyze-iot-weather-
station-data-via-connected-data-architecture/
⬢ Reach out to an Enterprise Account Manager
Care to Learn More?
28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Questions ?
29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache NiFi / ETL Tools
NiFi
NOT schema dependent
• Dataflow management for both structured
and unstructured data, powered by
separation of metadata and payload
• Schema is not required, but you can have
schema
• Minimum modeling effort, just enough to
manage dataflows
• Do the plumbing job, maximize developers’
brainpower for creative work
⚠ Not designed to do heavy lifting transformation
work for DB tables (JOIN datasets, etc.). You
can create custom processors to do that, but
long way to go to catch up with existing ETL
tools from user experience perspective (GUI for
data wrangling, cleansing, etc.)
ETL (Informatica, etc.)
Schema dependent
• Tailored for Databases/WH
• ETL operations based on schema/data
modeling
• Highly efficient, optimized performance
⚠ Must pre-prepare your data, time consuming to
build data modeling, and maintain schemas
⚠ Not geared towards handling unstructured data,
PDF, Audio, Video, etc.
⚠ Not designed to solve dataflow problems
30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache NiFi / Integration, or ingestion, Frameworks
NiFi
End user facing dataflow management
tool
• Out of the box solution for dataflow
management
• Interactive command and control in the core,
design and deploy on the edge
• Flexible failure handling at each point of the
flow
• Visual representation of global dataflow and
connectivities
• Native cross data center communication
• Data provenance for traceability
⚠ Not a library to be embedded in other
applications
Integration framework (Spring
Integration, Camel, etc), ingestion
framework (Flume, etc)
Developer facing integration tool with a
focus on data ingestion
• A set of tools to orchestrate workflow
• A fixed design and deploy pattern
• Leverage messaging bus across
disconnected networks
⚠ Developer facing, custom coding needed to
optimize
⚠ Pre-built failure handling, lack of flexibility
⚠ No holistic view of global dataflow
⚠ No built-in data traceability
31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache NiFi / Messaging Bus Services
NiFi
Provide dataflow solution
• Centralized management, from edge to
core
• Great traceability, event level data
provenance starting when data is born
• Interactive command and control – real
time operational visibility
• Dataflow management, including
prioritization, back pressure, and edge
intelligence
• Visual representation of global dataflow
⚠ Not a messaging bus, flow maintenance
needed when you have frequent consumer
side updates
Messaging Bus (Kafka, JMS, etc.)
Provide messaging bus service
• Low latency
• Great data durability
• Decentralized management (producers &
consumers)
• Low broker maintenance for dynamic
consumer side updates
⚠ Not designed to solve dataflow problems
(prioritization, edge intelligence, etc.)
⚠ Traceability limited to in/out of topics, no lineage
⚠ Lack of global view of
components/connectivities
32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Apache NiFi / Processing Frameworks
NiFi
Simple event processing
• Primarily feed data into processing
frameworks, can process data, with a
focus on simple event processing
• Operate on a single piece of data, or in
correlation with an enrichment dataset
(enrichment, parsing, splitting, and
transformations)
• Can scale out, but scale up better to
take full advantage of hardware
resources, run concurrent processing
tasks/threads (processing terabytes of
data per day on a single node)
⚠ Not another distributed processing
framework, but to feed data into those
Processing Frameworks (Storm, Spark,
etc.)
Complex and distributed processing
• Complex processing from multiple streams
(JOIN operations)
• Analyzing data across time windows (rolling
window aggregation, standard deviation,
etc.)
• Scale out to thousands of nodes if needed
⚠ Not designed to collect data or manage data
flow

More Related Content

What's hot

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseDataWorks Summit
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics OptimizationIsheeta Sanghi
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupSaptak Sen
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at ScaleDataWorks Summit
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and ApexBryan Bende
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and FlinkBryan Bende
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiDataWorks Summit
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?DataWorks Summit
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerDataWorks Summit
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale Timothy Spann
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?DataWorks Summit
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopDataWorks Summit
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0DataWorks Summit
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiMark Kerzner
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiDataWorks Summit
 

What's hot (20)

Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Log Analytics Optimization
Log Analytics OptimizationLog Analytics Optimization
Log Analytics Optimization
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Solving Cybersecurity at Scale
Solving Cybersecurity at ScaleSolving Cybersecurity at Scale
Solving Cybersecurity at Scale
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile – Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?Fast SQL on Hadoop, really?
Fast SQL on Hadoop, really?
 
Curing the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging ManagerCuring the Kafka Blindness – Streams Messaging Manager
Curing the Kafka Blindness – Streams Messaging Manager
 
Apache NiFi: Ingesting Enterprise Data At Scale
Apache NiFi:   Ingesting Enterprise Data At Scale Apache NiFi:   Ingesting Enterprise Data At Scale
Apache NiFi: Ingesting Enterprise Data At Scale
 
What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?What's New in Apache Hive 3.0?
What's New in Apache Hive 3.0?
 
Manage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in HadoopManage democratization of the data - Data Replication in Hadoop
Manage democratization of the data - Data Replication in Hadoop
 
Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0Meet HBase 2.0 and Phoenix 5.0
Meet HBase 2.0 and Phoenix 5.0
 
Joe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFiJoe Witt presentation on Apache NiFi
Joe Witt presentation on Apache NiFi
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
IoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFiIoT with Apache MXNet and Apache NiFi and MiniFi
IoT with Apache MXNet and Apache NiFi and MiniFi
 

Similar to Data Con LA 2018 - Streaming and IoT by Pat Alwell

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveAldrin Piri
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData DayJohn Park
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiAldrin Piri
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Hortonworks
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...Hortonworks
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsHortonworks
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopHortonworks
 

Similar to Data Con LA 2018 - Streaming and IoT by Pat Alwell (20)

Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
SoCal BigData Day
SoCal BigData DaySoCal BigData Day
SoCal BigData Day
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
 
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior GraphsPredicting Customer Experience through Hadoop and Customer Behavior Graphs
Predicting Customer Experience through Hadoop and Customer Behavior Graphs
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache HadoopRescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
 

More from Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

More from Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Data Con LA 2018 - Streaming and IoT by Pat Alwell

  • 1. 1 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Hortonworks Data Flow Wrangling the Internet of Things Pat Alwell – Solutions Engineer Big Data Day Los Angeles ,CA August 2018
  • 2. 2 © Hortonworks Inc. 2011 – 2017. All Rights Reserved whoami Pat Alwell Solutions Engineer Hortonworks AWS | Spark | Hadoop Admin Career Started at Algebraix Data Connect with me: GitHub  https://github.com/patalwell Email  palwell@hortonworks.com Goals for the Session • Demonstrate how organizations can leverage Hortonworks Dataflow (HDF) to wrangle the Internet of Things
  • 3. 3 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache NiFi Managed Dataflow SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 4. 4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Nothing in HDF Makes Sense Except in Light of Flow Based Programming Flow Management Administration HDF Streaming Flow-based programming is an abstraction of information packets, algorithmic transformations, and a common set of connections. The flow of data is essentially equivocal to a production line. Raw material is pushed or pulled into a process and transformed to meet an end goal.
  • 5. 5 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • Connection = Route between processors  Queues that can be dynamically prioritized • Process Group = Logical Group of processors and their connections  Receive data via input ports, send data via output ports  FlowFile = Unit of data moving through the system  Content + Attributes (Metadata)  Processor = Process data  Transforms/Writes FlowFiles  Creates Provenance Nifi Terminology
  • 6. 6 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What is HDF?
  • 7. 7 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Flow Management HDF How Can we Manage Flows with HDF? Apache NiFi supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. Apache MiNiFi is a complementary data collection approach that supplements the core tenets of NiFi in dataflow management, focusing on the collection of data at the source of its creation.
  • 8. 8 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Visual Command and Control A Convenient Graphical User Interface that supports Flow Based Programming • Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
  • 9. 9 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Visual Command and Control Over 200 + Processors designed to help you capture and deliver data to and from common sources Examples Include: -Capturing Logs from Mobile Devices and Sensors; formatting said logs with Regex, pushing said logs into HDFS or S3 -Collecting sensor readings from GPIO headers and delivering the information to an application via Kafka -Customer sentiment analysis by joining social media data to customer information within Hive or Phoenix
  • 10. 10 © Hortonworks Inc. 2011 – 2017. All Rights Reserved 220+ Processors for Deeper Ecosystem Integration Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute All Apache project logos are trademarks of the ASF and the respective projects. Fetch HTTP Syslog Email HTML Image HL7 FTP UDP XML SFTP AMQP WebSocket
  • 11. 11 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Provenance and Lineage
  • 12. 12 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Payload Prioritization • Configure a prioritizer per connection • Determine what is important for your data – time based, arrival order, importance of a data set • Funnel many connections down to a single connection to prioritize across data sets • Develop your own prioritizer if needed
  • 13. 13 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Back-Pressure
  • 14. 14 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Latency vs. Throughput
  • 15. 15 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • Java • < 40MB binary distribution • Requires JRE 1.8 • More feature complete • Targeted for any system that can run a JVM • Supported Processors: https://github.com/apache/nifi- minifi/blob/6ddf8bb0ee3614320a53ce7f2e0b3950ee4d9c5f /minifi-docs/src/main/markdown/minifi-java-agent-quick- start.md • C++ • Dynamic heap of ~1MB based on use-case • Targeted for resource constrained environments • Supported Processors: https://github.com/apache/nifi- minifi-cpp/blob/master/PROCESSORS.md Minifi’s Key Features An Embedded Extension that supports Flow Based Programming on the Edge Agents Provide: • Small and lightweight footprint • Central management of agents • Generation of data provenance • Integration with NiFi for follow-on dataflow management and full chain of custody of information
  • 16. 16 © Hortonworks Inc. 2011 – 2017. All Rights Reserved How can we Administer and Secure flow activity? Administration HDF The Apache Ambari project is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backed by its RESTful APIs. - https://ambari.apache.org/
  • 17. 17 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Cluster Administration and Role Based Security
  • 18. 18 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • NiFi Registry - sub-project of Apache NiFi • https://github.com/apache/nifi-registry • https://issues.apache.org/jira/projects/NIFIREG • Complimentary application, central location for storage/management of “versioned” resources • Initial capability to store and retrieve “versioned flows” • Integration on NiFi side • Start/Stop version control of a process group • Change version (upgrade/downgrade) • Import new process group from a version Version Control for Flows
  • 19. 19 © Hortonworks Inc. 2011 – 2017. All Rights Reserved • Parameterize configuration like connection strings, file paths, etc. • Referenced via Expression language • Kafka Brokers = ${kafka.brokers} • Variables associated with a process group • Right-click on canvas to view variables for current process group • Hierarchical order of precedence, resolve closest reference to component • Editing variables automatically restarts any components referencing the variables! Level 1 Level 2 Vars Vars Variable Registry Flow Ubiquity
  • 20. 20 © Hortonworks Inc. 2011 – 2017. All Rights Reserved ⬢ Data Governance – Centralized registry to provide reusable schema – Version management to define relationship between schemas – Validation to enable generic format conversion and generic routing ⬢ Operational Efficiency – Centralized registry to avoid attaching schema to every piece of data – Version management to enable consumers and producers can evolve at different rates – Validation to ensure data quality Schema Registry
  • 21. 21 © Hortonworks Inc. 2011 – 2017. All Rights Reserved How Can we take advantage of Streaming computations? HDF Streaming
  • 22. 22 © Hortonworks Inc. 2011 – 2017. All Rights Reserved A Distributed Streaming Platform that supports Pub-Sub Systems • Publish and Subscribe to streams of records, similar to a messaging queue • Store streams of records in a fault-tolerant way • Process Streams of records as they occur • Topic is a partitioned Log of events Generally used to… • Build real-time streaming data pipelines to reliably transfer data between systems • Build real-time streaming applications that transform or react to the streams of data What is Kafka?
  • 23. 23 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Streaming Messaging Manager (*NEW) “Kafka Blindness” – Customers who use Kafka today struggle with monitoring and managing Kafka clusters.
  • 24. 24 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What is Storm? A Distributed Fault Tolerant Service that processes Streams of Data• Capture data from external systems (Kafka, Hbase, Hive, HDFS, and AWS Kinesis) Spout • Transform and aggregate said data using filter, map, flatMap, aggregate, reduce, count, etc. Bolt • Write data back to an external system for storage or visualization (Hbase, Hive, Druid) Bolt • The chain is known as a topology. The topology is run under a master slave type architecture. Generally used to… • Processing streams • No need for intermediate queues. Continuous computation Send data to clients continuously so they can update and show results in real time, such as site metrics. • Distributed remote procedure call • Easily parallelize CPU-intensive operations. * * Leibiusky, Jonathan. Getting Started with Storm: Continuous Streaming Computation with Twitter's Cluster Technology (p. 1). O'Reilly Media. Kindle Edition.
  • 25. 25 © Hortonworks Inc. 2011 – 2017. All Rights Reserved What does our flow look like?
  • 26. 26 © Hortonworks Inc. 2011 – 2017. All Rights Reserved ⬢ Full Q&A Platform (like StackOverflow) ⬢ Knowledge Base Articles ⬢ Code Gallery and Samples ⬢ https://community.hortonworks.com Join Us: Hortonworks Community Connection
  • 27. 27 © Hortonworks Inc. 2011 – 2017. All Rights Reserved ⬢ Download our HDF Sandbox for Docker, VMWare, or VirtualBox: https://hortonworks.com/downloads/#sandbox ⬢ Follow our HDF tutorials: https://hortonworks.com/tutorial/analyze-iot-weather- station-data-via-connected-data-architecture/ ⬢ Reach out to an Enterprise Account Manager Care to Learn More?
  • 28. 28 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Questions ?
  • 29. 29 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache NiFi / ETL Tools NiFi NOT schema dependent • Dataflow management for both structured and unstructured data, powered by separation of metadata and payload • Schema is not required, but you can have schema • Minimum modeling effort, just enough to manage dataflows • Do the plumbing job, maximize developers’ brainpower for creative work ⚠ Not designed to do heavy lifting transformation work for DB tables (JOIN datasets, etc.). You can create custom processors to do that, but long way to go to catch up with existing ETL tools from user experience perspective (GUI for data wrangling, cleansing, etc.) ETL (Informatica, etc.) Schema dependent • Tailored for Databases/WH • ETL operations based on schema/data modeling • Highly efficient, optimized performance ⚠ Must pre-prepare your data, time consuming to build data modeling, and maintain schemas ⚠ Not geared towards handling unstructured data, PDF, Audio, Video, etc. ⚠ Not designed to solve dataflow problems
  • 30. 30 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache NiFi / Integration, or ingestion, Frameworks NiFi End user facing dataflow management tool • Out of the box solution for dataflow management • Interactive command and control in the core, design and deploy on the edge • Flexible failure handling at each point of the flow • Visual representation of global dataflow and connectivities • Native cross data center communication • Data provenance for traceability ⚠ Not a library to be embedded in other applications Integration framework (Spring Integration, Camel, etc), ingestion framework (Flume, etc) Developer facing integration tool with a focus on data ingestion • A set of tools to orchestrate workflow • A fixed design and deploy pattern • Leverage messaging bus across disconnected networks ⚠ Developer facing, custom coding needed to optimize ⚠ Pre-built failure handling, lack of flexibility ⚠ No holistic view of global dataflow ⚠ No built-in data traceability
  • 31. 31 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache NiFi / Messaging Bus Services NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow ⚠ Not a messaging bus, flow maintenance needed when you have frequent consumer side updates Messaging Bus (Kafka, JMS, etc.) Provide messaging bus service • Low latency • Great data durability • Decentralized management (producers & consumers) • Low broker maintenance for dynamic consumer side updates ⚠ Not designed to solve dataflow problems (prioritization, edge intelligence, etc.) ⚠ Traceability limited to in/out of topics, no lineage ⚠ Lack of global view of components/connectivities
  • 32. 32 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Apache NiFi / Processing Frameworks NiFi Simple event processing • Primarily feed data into processing frameworks, can process data, with a focus on simple event processing • Operate on a single piece of data, or in correlation with an enrichment dataset (enrichment, parsing, splitting, and transformations) • Can scale out, but scale up better to take full advantage of hardware resources, run concurrent processing tasks/threads (processing terabytes of data per day on a single node) ⚠ Not another distributed processing framework, but to feed data into those Processing Frameworks (Storm, Spark, etc.) Complex and distributed processing • Complex processing from multiple streams (JOIN operations) • Analyzing data across time windows (rolling window aggregation, standard deviation, etc.) • Scale out to thousands of nodes if needed ⚠ Not designed to collect data or manage data flow