SlideShare a Scribd company logo
1 of 44
HDF 3.0 Deep Dive
Aldrin Piri
© Hortonworks Inc. 2011 – 2016. All Rights Reserved2
Agenda
Quick Overview of HDF & Apache NiFi
What’s new in HDF 3.0?
Latest Efforts in the Apache NiFi Community
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDF Data-In-Motion Platform
HDF provides the flow management, stream processing, and enterprise services
needed to collect, curate, analyze and act on data-in-motion across the data center and
cloud.
Flow Management
Data acquisi*on and delivery
Simple transforma*on and data rou*ng
Simple event processing
End to end provenance
Edge intelligence & bi-direc*onal communica*on
Stream Processing
Scalable data broker for streaming apps
Scale out complex transforma6on
Stream Analy2cs
Pa8ern Matching
Prescrip6ve & Predic6ve Stream Analy6cs
Complex Event Processing
Con6nuous Insight
Enterprise Services
Provisioning, Management, Monitoring, Security, Audit, Compliance, Governance, Mul: -tenancy
Java
Agent
C++
Agent
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Empower users to manage the
collection and flow of data
© Hortonworks Inc. 2011 – 2016. All Rights Reserved5
The Problem at Hand
Producers A.K.A Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Moving data effectively is hard
Standards: http://xkcd.com/927/
© Hortonworks Inc. 2011 – 2016. All Rights Reserved7
Apache NiFi: A Primer
Key Features and Principles
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Recovery/recording
a rolling log of fine-grained
history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
© Hortonworks Inc. 2011 – 2016. All Rights Reserved8
NiFi is based on Flow Based Programming (FBP)
FBP Term NiFi Term Description
Information
Packet
FlowFile Each object moving through the system.
Black Box FlowFile
Processor
Performs the work, doing some combination of data routing, transformation,
or mediation between systems.
Bounded
Buffer
Connection The linkage between processors, acting as queues and allowing various
processes to interact at differing rates.
Scheduler Flow
Controller
Maintains the knowledge of how processes are connected, and manages the
threads and allocations thereof which all processes use.
Subnet Process
Group
A set of processes and their connections, which can receive and send data via
ports. A process group allows creation of entirely new component simply by
composition of its components.
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi & Data Agnosticism
 NiFi is data agnostic!
 But, NiFi was designed understanding that users
can care about specifics and provides tooling
to interact with specific formats, protocols, etc.
ISO 8601 - http://xkcd.com/1179/
Robustness principle
Be conservative in what you do,
be liberal in what you accept from others“
© Hortonworks Inc. 2011 – 2016. All Rights Reserved10
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved12
Apache NiFi - MiNiFi
 Let me get the key parts of NiFi close to where data begins
 Bidirectional data transfer
 Greater illuminate journey with provenance
 NiFi lives in the data center. Give it an enterprise server or a cluster of
them.
 MiNiFi lives as close to where data is born and is a guest on that device or
system
© Hortonworks Inc. 2011 – 2016. All Rights Reserved13
Connecting the Drops
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
© Hortonworks Inc. 2011 – 2016. All Rights Reserved14
Managing data flow for a courier service
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
NiFi NiFi NiFi NiFi NiFi NiFi
Client
Libraries
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s new in HDF 3.0?
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Record Based Processing Mechanism
 Why?
– Improve operational efficiency
– Intuitive and flexible filtering/routing strategies powered by ‘QueryRecord’
– Simpler dataflow design
 What?
– Introduce ‘record’ based operation model
– ‘RecordReader’ and ‘RecordWriter’ controller services
– A series of processors supporting the reader/writer processing mechanism
• Plugin record reader to de-serialize bytes to record objects
• Plugin record writer to serialize record objects to bytes
• Enable operations against in-memory record objects
 How?
Record Processing
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Record Reader CS
Record Processing
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Record Reader/Writer Schema Access Strategy
Record Processing
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Record Based Processing
Data
Source
Centralized Schema
Repository
Lookup Schema
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Schema Registry? What Value Does it Provide?
 What is Schema Registry?
• A shared repository of schemas that allows applications to flexibly interact with each other - in order to save or
retrieve schemas for the data they need to access
 What Value does Schema Registry Provide?
1. Data Governance
• Provide reusable schema (centralized registry)
• Define relationship between schemas (version management)
• Enable generic format conversion, and generic routing (schema validation)
2. Operational Efficiency
 To avoid attaching schema to every piece of data (centralized registry)
 Consumers and producers can evolve at different rates (version management)
 Data quality (schema validation)
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry Concepts
Schema Group
• Group Name
• Schema Group - A logical
grouping/container for similar
type of schemas or based any
criteria that the customer has
from managing the schemas
Schema Metadata 1
• Schema Name
• Schema Type
• Description
• Compatibility Policy
• Serializers
• Deserializers
• Schema Meta - Metadata
associated with a named
schema.
Schema Version 1
• Schema Version
• Schema Text
• Schema Version- The
actual versioned schema
associated a schema meta
definition
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Schema Registry
Schema Registry Component Architecture
SR Web Server
Schema Registry
Web App
REST API
Pluggable Storage
Schema Metadata
Storage
Serializer/Deserializer
Jar Storage
Supported Meta
Stores
mySQL In-Memory
Supported Jar
Storage
Local File
System
HDFS
Schema Registry Client
Java Client
Integrations
NiFi Processors Kafka Se/DeSerializers
SAM
Processors
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integration of Schema Registry with HDF
 NiFi Processors for Schema Registry (HDF 2.2+)
 Fetch Schema
 Serialize/Deserialize with Schema
 SAM processors for Schema Registry (HDF 2.2, Available in Early Access Bundle)
 Lookup Schema of a Kafka Topic
 Atlas integration with Schema (Upcoming)
 Just like Atlas pull schema info from Hive MetaStore, Atlas can now capture schema,
format and semantic metadata from events in HDF via the registry. Coming in HDF
2.3
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
‘QueryRecord’ Processor
Record Processing
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
‘QueryRecord’ Use Case
Record Processing
 Use Case Background
– Oil & Gas: log data collection from remote drilling station
– expensive BGAN satellite network connectivity (64KB)
 Problem Statement
– 1000 sensors, costly to send ALL data ALL THE TIME
– Collect all sensor data only if: upstream pump is off, BUT you have an elevated pipe
pressure
 Solution
– Define routing strategy using ’QueryRecord’
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Component versioning
Component Version
 Why?
– Foundational work to enable extension registry
– Foundational work to enhance flow migration experience
 What?
– Support multiple versions of the same NAR in a single NIFI instance
– E.g. Hadoop NAR version A: Apache Hadoop client lib; Hadoop NAR version B: proprietary Hadoop
client lib
 How?
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Component versioning
Component Version
© Hortonworks Inc. 2011 – 2016. All Rights Reserved29
Latest Efforts in the Apache NiFi Community
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Addressing common asks
How can I … How do I ... What about ...
 Version my flows?
 Drive CI/CD processes?
 Migrate flows between environments?
 Provision distributions of NiFi with a set of components?
 Make reference datasets/extensions available to the entirety of my data
flow?
 Certify / Audit / Sign-off on flows as compliant per regulations?
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Capturing the essence of a flow in your organization
 The n-dimensions of data flow
 Consider a flowfile to be a singular event at a given juncture in its processing
 A flow is the directed graph of processing at a given point in time
 With each component’s:
 Configuration
 Version
 Referenced Assets
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Introducing
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
Registry is an enabler
 SDLC
 Manage variables, sensitive properties for environments
 Extension Registry
 Association/tagging of data with the flow that created it
 Enhanced Command and Control of MiNiFi instances
© Hortonworks Inc. 2011 – 2016. All Rights Reserved40
Apache NiFi - MiNiFi: Centralized Command & Control (C2)
 Provide flow updates, information and assets to instances where they live
 Act as a gateway to/from network enclaves
 Provide a user interface/experience for design & deploy and monitoring
Extend the reach of user experience and operations
© Hortonworks Inc. 2011 – 2016. All Rights Reserved41
Docker Containerization
 First Apache community released version in 1.2.0
 https://hub.docker.com/r/apache/nifi
 Provide Docker images for all components of the NiFi ecosystem
Further supplement operations
© Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Evolution of Apache NiFi
 Our core substrate for data flow is NiFi & MiNiFi
 Command and Control facilitates operations and management of components
 Registry for common tasks with disparate resources across the NiFi ecosystem
© Hortonworks Inc. 2011 – 2016. All Rights Reserved43
Why the Apache NiFi Ecosystem?
 Moving data is multifaceted in its challenges and these are present in different contexts
at varying scopes
 Provide components and a platform with common tooling and extensions that are
commonly needed but be flexible for extension in all aspects
– Allow organizations to integrate with their existing infrastructure
 Empower folks managing your infrastructure to make changes and reason about issues
that are occurring
– Data Provenance to show context and data’s journey
– User Interface/Experience a key component
© Hortonworks Inc. 2011 – 2016. All Rights Reserved44
Thank You

More Related Content

What's hot

Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiBryan Bende
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseAldrin Piri
 
Introduction to HDF 3.0
Introduction to HDF 3.0Introduction to HDF 3.0
Introduction to HDF 3.0Timothy Spann
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiDataWorks Summit
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHaimo Liu
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemBryan Bende
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiDataWorks Summit
 
Apache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleApache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleAbdelkrim Hadjidj
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionMilind Pandit
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Timothy Spann
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataWorks Summit
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User GuideDeon Huang
 
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...DataWorks Summit
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiAldrin Piri
 

What's hot (20)

Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Introduction to HDF 3.0
Introduction to HDF 3.0Introduction to HDF 3.0
Introduction to HDF 3.0
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFiIntelligently Collecting Data at the Edge - Intro to Apache MiNiFi
Intelligently Collecting Data at the Edge - Intro to Apache MiNiFi
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
Apache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scaleApache NiFi: latest developments for flow management at scale
Apache NiFi: latest developments for flow management at scale
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 
The Elephant in the Clouds
The Elephant in the CloudsThe Elephant in the Clouds
The Elephant in the Clouds
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
Forget Duplicating Local Changes: Apache NiFi and the Flow Development Lifecy...
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
BigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFiBigData Techcon - Beyond Messaging with Apache NiFi
BigData Techcon - Beyond Messaging with Apache NiFi
 

Similar to Future of Data New Jersey - HDF 3.0 Deep Dive

Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiDataWorks Summit
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFiHortonworks
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupJoseph Witt
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016alanfgates
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...Big Data Spain
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017alanfgates
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseDataWorks Summit
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Ashish Narasimham
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesTimothy Spann
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerDataWorks Summit
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveBryan Bende
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityAccumulo Summit
 

Similar to Future of Data New Jersey - HDF 3.0 Deep Dive (20)

Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Apache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming MeetupApache NiFi - Flow Based Programming Meetup
Apache NiFi - Flow Based Programming Meetup
 
Big data spain keynote nov 2016
Big data spain keynote nov 2016Big data spain keynote nov 2016
Big data spain keynote nov 2016
 
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
The Enterprise and Connected Data, Trends in the Apache Hadoop Ecosystem by A...
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
An Apache Hive Based Data Warehouse
An Apache Hive Based Data WarehouseAn Apache Hive Based Data Warehouse
An Apache Hive Based Data Warehouse
 
Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30Big data processing engines, Atlanta Meetup 4/30
Big data processing engines, Atlanta Meetup 4/30
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2Hortonworks and Red Hat Webinar - Part 2
Hortonworks and Red Hat Webinar - Part 2
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 

Recently uploaded

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfFIDO Alliance
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsUXDXConf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeCzechDreamin
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfFIDO Alliance
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Julian Hyde
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty SecureFemke de Vroome
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...FIDO Alliance
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreelreely ones
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfFIDO Alliance
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FIDO Alliance
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 

Recently uploaded (20)

A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi IbrahimzadeFree and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
Free and Effective: Making Flows Publicly Accessible, Yumi Ibrahimzade
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
THE BEST IPTV in GERMANY for 2024: IPTVreel
THE BEST IPTV in  GERMANY for 2024: IPTVreelTHE BEST IPTV in  GERMANY for 2024: IPTVreel
THE BEST IPTV in GERMANY for 2024: IPTVreel
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 

Future of Data New Jersey - HDF 3.0 Deep Dive

  • 1. HDF 3.0 Deep Dive Aldrin Piri
  • 2. © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 Agenda Quick Overview of HDF & Apache NiFi What’s new in HDF 3.0? Latest Efforts in the Apache NiFi Community
  • 3. © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDF Data-In-Motion Platform HDF provides the flow management, stream processing, and enterprise services needed to collect, curate, analyze and act on data-in-motion across the data center and cloud. Flow Management Data acquisi*on and delivery Simple transforma*on and data rou*ng Simple event processing End to end provenance Edge intelligence & bi-direc*onal communica*on Stream Processing Scalable data broker for streaming apps Scale out complex transforma6on Stream Analy2cs Pa8ern Matching Prescrip6ve & Predic6ve Stream Analy6cs Complex Event Processing Con6nuous Insight Enterprise Services Provisioning, Management, Monitoring, Security, Audit, Compliance, Governance, Mul: -tenancy Java Agent C++ Agent
  • 4. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Empower users to manage the collection and flow of data
  • 5. © Hortonworks Inc. 2011 – 2016. All Rights Reserved5 The Problem at Hand Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  • 6. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving data effectively is hard Standards: http://xkcd.com/927/
  • 7. © Hortonworks Inc. 2011 – 2016. All Rights Reserved7 Apache NiFi: A Primer Key Features and Principles • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine-grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  • 8. © Hortonworks Inc. 2011 – 2016. All Rights Reserved8 NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  • 9. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi & Data Agnosticism  NiFi is data agnostic!  But, NiFi was designed understanding that users can care about specifics and provides tooling to interact with specific formats, protocols, etc. ISO 8601 - http://xkcd.com/1179/ Robustness principle Be conservative in what you do, be liberal in what you accept from others“
  • 10. © Hortonworks Inc. 2011 – 2016. All Rights Reserved10
  • 11. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 12. © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 Apache NiFi - MiNiFi  Let me get the key parts of NiFi close to where data begins  Bidirectional data transfer  Greater illuminate journey with provenance  NiFi lives in the data center. Give it an enterprise server or a cluster of them.  MiNiFi lives as close to where data is born and is a guest on that device or system
  • 13. © Hortonworks Inc. 2011 – 2016. All Rights Reserved13 Connecting the Drops SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 14. © Hortonworks Inc. 2011 – 2016. All Rights Reserved14 Managing data flow for a courier service Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ Client Libraries Client Libraries MiNiFi MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi Client Libraries
  • 15. © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s new in HDF 3.0?
  • 16. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Record Based Processing Mechanism  Why? – Improve operational efficiency – Intuitive and flexible filtering/routing strategies powered by ‘QueryRecord’ – Simpler dataflow design  What? – Introduce ‘record’ based operation model – ‘RecordReader’ and ‘RecordWriter’ controller services – A series of processors supporting the reader/writer processing mechanism • Plugin record reader to de-serialize bytes to record objects • Plugin record writer to serialize record objects to bytes • Enable operations against in-memory record objects  How? Record Processing
  • 17. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Record Reader CS Record Processing
  • 18. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Record Reader/Writer Schema Access Strategy Record Processing
  • 19. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Record Based Processing Data Source Centralized Schema Repository Lookup Schema
  • 20. © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Schema Registry? What Value Does it Provide?  What is Schema Registry? • A shared repository of schemas that allows applications to flexibly interact with each other - in order to save or retrieve schemas for the data they need to access  What Value does Schema Registry Provide? 1. Data Governance • Provide reusable schema (centralized registry) • Define relationship between schemas (version management) • Enable generic format conversion, and generic routing (schema validation) 2. Operational Efficiency  To avoid attaching schema to every piece of data (centralized registry)  Consumers and producers can evolve at different rates (version management)  Data quality (schema validation)
  • 21. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Concepts Schema Group • Group Name • Schema Group - A logical grouping/container for similar type of schemas or based any criteria that the customer has from managing the schemas Schema Metadata 1 • Schema Name • Schema Type • Description • Compatibility Policy • Serializers • Deserializers • Schema Meta - Metadata associated with a named schema. Schema Version 1 • Schema Version • Schema Text • Schema Version- The actual versioned schema associated a schema meta definition
  • 22. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Schema Registry Schema Registry Component Architecture SR Web Server Schema Registry Web App REST API Pluggable Storage Schema Metadata Storage Serializer/Deserializer Jar Storage Supported Meta Stores mySQL In-Memory Supported Jar Storage Local File System HDFS Schema Registry Client Java Client Integrations NiFi Processors Kafka Se/DeSerializers SAM Processors
  • 23. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integration of Schema Registry with HDF  NiFi Processors for Schema Registry (HDF 2.2+)  Fetch Schema  Serialize/Deserialize with Schema  SAM processors for Schema Registry (HDF 2.2, Available in Early Access Bundle)  Lookup Schema of a Kafka Topic  Atlas integration with Schema (Upcoming)  Just like Atlas pull schema info from Hive MetaStore, Atlas can now capture schema, format and semantic metadata from events in HDF via the registry. Coming in HDF 2.3
  • 24. © Hortonworks Inc. 2011 – 2016. All Rights Reserved ‘QueryRecord’ Processor Record Processing
  • 25. © Hortonworks Inc. 2011 – 2016. All Rights Reserved ‘QueryRecord’ Use Case Record Processing  Use Case Background – Oil & Gas: log data collection from remote drilling station – expensive BGAN satellite network connectivity (64KB)  Problem Statement – 1000 sensors, costly to send ALL data ALL THE TIME – Collect all sensor data only if: upstream pump is off, BUT you have an elevated pipe pressure  Solution – Define routing strategy using ’QueryRecord’
  • 26. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 27. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Component versioning Component Version  Why? – Foundational work to enable extension registry – Foundational work to enhance flow migration experience  What? – Support multiple versions of the same NAR in a single NIFI instance – E.g. Hadoop NAR version A: Apache Hadoop client lib; Hadoop NAR version B: proprietary Hadoop client lib  How?
  • 28. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Component versioning Component Version
  • 29. © Hortonworks Inc. 2011 – 2016. All Rights Reserved29 Latest Efforts in the Apache NiFi Community
  • 30. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Addressing common asks How can I … How do I ... What about ...  Version my flows?  Drive CI/CD processes?  Migrate flows between environments?  Provision distributions of NiFi with a set of components?  Make reference datasets/extensions available to the entirety of my data flow?  Certify / Audit / Sign-off on flows as compliant per regulations?
  • 31. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Capturing the essence of a flow in your organization  The n-dimensions of data flow  Consider a flowfile to be a singular event at a given juncture in its processing  A flow is the directed graph of processing at a given point in time  With each component’s:  Configuration  Version  Referenced Assets
  • 32. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introducing
  • 33. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 34. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 35. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 36. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 37. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 38. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 39. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry is an enabler  SDLC  Manage variables, sensitive properties for environments  Extension Registry  Association/tagging of data with the flow that created it  Enhanced Command and Control of MiNiFi instances
  • 40. © Hortonworks Inc. 2011 – 2016. All Rights Reserved40 Apache NiFi - MiNiFi: Centralized Command & Control (C2)  Provide flow updates, information and assets to instances where they live  Act as a gateway to/from network enclaves  Provide a user interface/experience for design & deploy and monitoring Extend the reach of user experience and operations
  • 41. © Hortonworks Inc. 2011 – 2016. All Rights Reserved41 Docker Containerization  First Apache community released version in 1.2.0  https://hub.docker.com/r/apache/nifi  Provide Docker images for all components of the NiFi ecosystem Further supplement operations
  • 42. © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Evolution of Apache NiFi  Our core substrate for data flow is NiFi & MiNiFi  Command and Control facilitates operations and management of components  Registry for common tasks with disparate resources across the NiFi ecosystem
  • 43. © Hortonworks Inc. 2011 – 2016. All Rights Reserved43 Why the Apache NiFi Ecosystem?  Moving data is multifaceted in its challenges and these are present in different contexts at varying scopes  Provide components and a platform with common tooling and extensions that are commonly needed but be flexible for extension in all aspects – Allow organizations to integrate with their existing infrastructure  Empower folks managing your infrastructure to make changes and reason about issues that are occurring – Data Provenance to show context and data’s journey – User Interface/Experience a key component
  • 44. © Hortonworks Inc. 2011 – 2016. All Rights Reserved44 Thank You