Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Connecting the Drops with Apache NiFi & Apache MiNiFi

586 views

Published on

Demand for increased capture of information to drive analytic insights into an organizations' assets and infrastructure is growing at unprecedented rates. However, as data volume growth soars, the ability to provide seamless ingestion pipelines becomes operationally complex as the magnitude of data sources and types expands.

This talk will focus on the efforts of the Apache NiFi community including subproject, MiNiFi; an agent based architecture and its relation to the core Apache NiFi project. MiNiFi is focused on providing a platform that meets and adapts to where data is born while providing the core tenets of NiFi in provenance, security, and command and control. These capabilities provide versatile avenues for the bi-directional exchange of information across data and control planes while dealing with the constraints of operation at opposite ends of the scale spectrum tackling the first and last miles of dataflow management.

We will highlight ongoing and new efforts in the community to provide greater flexibility with deployment and configuration management of flows. Versioned flows provide greater operational flexibility and serve as a powerful foundation to orchestrate the collection and transmission from the point of data's inception through to its transmission to consumers and processing systems.

Published in: Technology
  • Be the first to comment

Connecting the Drops with Apache NiFi & Apache MiNiFi

  1. 1. Connecting the Drops with Apache NiFi & MiNiFi Aldrin Piri – Apache NiFi PMC @aldrinpiri
  2. 2. © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 Agenda Apache NiFi Fundamentals Expanding the Reach of NiFi with Apache NiFi - MiNiFi Evolving the NiFi Ecosystem Apache NiFi Registry Community
  3. 3. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Empower users to manage the collection and flow of data
  4. 4. © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 The Problem at Hand Producers A.K.A Things Anything AND Everything Internet! Consumers • User • Storage • System • …More Things
  5. 5. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Moving data effectively is hard Standards: http://xkcd.com/927/
  6. 6. © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 Apache NiFi: A Primer Key Features and Principles • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Recovery/recording a rolling log of fine-grained history • Visual command and control • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering
  7. 7. © Hortonworks Inc. 2011 – 2016. All Rights Reserved7 NiFi is based on Flow Based Programming (FBP) FBP Term NiFi Term Description Information Packet FlowFile Each object moving through the system. Black Box FlowFile Processor Performs the work, doing some combination of data routing, transformation, or mediation between systems. Bounded Buffer Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates. Scheduler Flow Controller Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use. Subnet Process Group A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
  8. 8. © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi & Data Agnosticism  NiFi is data agnostic!  But, NiFi was designed understanding that users can care about specifics and provides tooling to interact with specific formats, protocols, etc. ISO 8601 - http://xkcd.com/1179/ Robustness principle Be conservative in what you do, be liberal in what you accept from others“
  9. 9. © Hortonworks Inc. 2011 – 2016. All Rights Reserved9
  10. 10. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  11. 11. © Hortonworks Inc. 2011 – 2016. All Rights Reserved11 Apache NiFi - MiNiFi  Let me get the key parts of NiFi close to where data begins  Bidirectional data transfer  Greater illuminate journey with provenance  NiFi lives in the data center. Give it an enterprise server or a cluster of them.  MiNiFi lives as close to where data is born and is a guest on that device or system
  12. 12. © Hortonworks Inc. 2011 – 2016. All Rights Reserved12 Apache NiFi - MiNiFi  Limited computing capability  Limited power/network  Restricted software library/platform availability  No UI  Physically inaccessible  Not frequently updated  Competing standards/protocols  Scalability  Privacy & Security Realities of computing outside the cozy datacenter
  13. 13. © Hortonworks Inc. 2011 – 2016. All Rights Reserved13 Apache NiFi - MiNiFi: Scoping  Go small: Java – Write once, run anywhere* – Feature parity and reuse of core NiFi libraries  Go smaller: C++ – Write once**, run anywhere  Go smallest: Write n-many times, embed, run anywhere Language libraries to support tagging, FlowFile format, Site to Site protocol, and provenance generation without a full processing framework – Language SDKs, Mobile Platforms Provide all the key principles of NiFi in varying, smaller footprints
  14. 14. © Hortonworks Inc. 2011 – 2016. All Rights Reserved14 Apache NiFi - MiNiFi: The Differences  No UI / Declarative configuration – Supports YAML – Extensible interface to ingest other formats  Reduced set of bundled components  Minimize initial size Departures from NiFi
  15. 15. © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 Apache NiFi - MiNiFi: Centralized Command & Control (C2)  Provide flow updates, information and assets to instances where they live  Act as a gateway to/from network enclaves  Provide a user interface/experience for design & deploy and monitoring Extend the reach of user experience and operations
  16. 16. © Hortonworks Inc. 2011 – 2016. All Rights Reserved16 Connecting the Drops SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  17. 17. © Hortonworks Inc. 2011 – 2016. All Rights Reserved17 Managing data flow for a courier service Physical Store Gateway Server Mobile Devices Registers Server Cluster Distribution Center Kafka Core Data Center at HQ Server Cluster Others Storm / Spark / Flink / Apex Kafka Storm / Spark / Flink / Apex On Delivery Routes Trucks Deliverers Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/ Deliverer: Rigo Peter, https://thenounproject.com/rigo/ Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/ Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/ Client Libraries Client Libraries MiNiFi MiNiFi NiFi NiFi NiFi NiFi NiFi NiFi Client Libraries
  18. 18. © Hortonworks Inc. 2011 – 2016. All Rights Reserved18 Evolving the NiFi Platform
  19. 19. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Listening to our community How can I … How do I ... What about ...  Version my flows?  Drive CI/CD processes?  Migrate flows between environments?  Provision distributions of NiFi with a set of components?  Make reference datasets/extensions available to the entirety of my data flow?  Certify / Audit / Sign-off on flows as compliant per regulations?
  20. 20. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Capturing the essence of a flow in your organization  The n-dimensions of data flow  Consider a flowfile to be a singular event at a given juncture in its processing  A flow is the directed graph of processing at a given point in time  With each component’s:  Configuration  Version  Referenced Assets
  21. 21. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Introducing
  22. 22. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  23. 23. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  24. 24. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  25. 25. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  26. 26. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  27. 27. © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  28. 28. © Hortonworks Inc. 2011 – 2016. All Rights Reserved Registry is an enabler  SDLC  Manage variables, sensitive properties for environments  Extension Registry  Association/tagging of data with the flow that created it  Enhanced Command and Control of MiNiFi instances
  29. 29. © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Evolution of Apache NiFi  Our core substrate for data flow is NiFi & MiNiFi  Command and Control facilitates operations and management of components  Registry for common tasks with disparate resources across the NiFi ecosystem
  30. 30. © Hortonworks Inc. 2011 – 2016. All Rights Reserved30 Why the Apache NiFi Ecosystem?  Moving data is multifaceted in its challenges and these are present in different contexts at varying scopes  Provide components and a platform with common tooling and extensions that are commonly needed but be flexible for extension in all aspects – Allow organizations to integrate with their existing infrastructure  Empower folks managing your infrastructure to make changes and reason about issues that are occurring – Data Provenance to show context and data’s journey – User Interface/Experience a key component
  31. 31. © Hortonworks Inc. 2011 – 2016. All Rights Reserved31 Community
  32. 32. © Hortonworks Inc. 2011 – 2016. All Rights Reserved32 Apache NiFi Crash Course Wednesday, 14 June 11:00 AM – 1:30PM, Room LL21A • Learn more about NiFi, the community, and work through a hands-on lab • Seats available on a first come, first served basis • Make sure you are in possession of the latest version of VirtualBox
  33. 33. © Hortonworks Inc. 2011 – 2016. All Rights Reserved33 Learn, Share at Birds of a Feather IOT, STREAMING & DATA FLOW Thursday, June 15 5:50 pm, Ballroom C
  34. 34. © Hortonworks Inc. 2011 – 2016. All Rights Reserved34 Learn more and join us! Project Sites: NiFi: https://nifi.apache.org Subproject MiNiFi: https://nifi.apache.org/minifi/ Subproject Registry: http://nifi.apache.org/registry.html Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI https://issues.apache.org/jira/browse/MINIFI Follow us on Twitter @apachenifi
  35. 35. © Hortonworks Inc. 2011 – 2016. All Rights Reserved35 Thank You

×