Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Joe Witt presentation on Apache NiFi

6,094 views

Published on

Joe Witt of of Onyara presented Apache NiFi at Houston Hadoop Meetup on June 9, 2015

Published in: Technology

Joe Witt presentation on Apache NiFi

  1. 1. Apache NiFi Better Analytics Demand Better Dataflow Presented by: Joe Witt Apache NiFi PPMC Member
  2. 2. 2   @apachenifi
  3. 3. History 3   •  Developed at NSA for over eight years •  Donated to the Apache Software Foundation Nov 2014 •  Undergoing incubation •  Three ASF releases to date •  Closing in on 0.2.0 release  
  4. 4. The problem space: Enterprise Dataflow 4   Automate the flow of data from any source …to systems which extract meaning and insight …and to those that store and make it available for users
  5. 5. Use Cases for NiFi 5   •  Remote sensor delivery •  Inter-site / global distribution •  Intra-site distribution •  ‘Big Data’ ingest •  Data Processing (enrichment, filtering, sanitization)  
  6. 6. The challenges we faced 6   •  Transport / Messaging was not enough •  Needed to understand the big picture •  Needed the ability to make *immediate* changes •  Must maintain chain of custody for data •  Rigorous security and compliance requirements  
  7. 7. Why transport and messaging was not enough? 7   •  Data access exceeded resources to transport •  Decoupling systems is about more than the connectivity •  Message sizes ranged from B to GB •  Not all data is created equal •  Needed precise security controls •  SSL and topic level authorization insufficient
  8. 8.   The basic building blocks Real-time Command and Control The Power of Provenance 8   Apache NiFi Foundational Concepts 2 3 1
  9. 9. HEADER   -­‐  UUID   -­‐  Name   -­‐  Size   -­‐  Entry  Time              A3ributes  Map                [[Key  |  Value]]   CONTENT   Flow File 9   •  Types •  Events •  Objects •  Files •  Messages •  Media •  Formats •  JSON •  Avro •  Text •  Mp4 •  Proprietary •  Sizes •  Bytes to GBs
  10. 10. Flow File Processor 10   •  Routing •  Context •  Content •  Transformation •  Enrich •  Obfuscate •  Filter •  Convert •  Analyze •  Split •  Aggregate •  Mediation •  Push / Pull •  …
  11. 11. Connections 11   •  Queuing •  Back Pressure •  Expiration •  Prioritize •  Swapping
  12. 12. Flow Controller 12  
  13. 13. NiFi Architecture 13  
  14. 14. NiFi Clustering Model 14  
  15. 15. Tighten the feedback loop •  Changes have consequences (good or bad) •  And you see them as they occur Continuous Improvement •  Compare real-time vs. historical statistics •  View data provenance •  View Content at any stage Intuitive user experience •  Visual programming •  Logical flow graph 15   Real-time command and control2
  16. 16. Latency Optimization •  Intra process •  Inter process •  End-to-end Compliance •  Prove handling •  Assess impact Understanding •  Step through time •  View content •  View Context 16   The Power of Provenance – Chain of custody for data3
  17. 17. 17   Demo
  18. 18. Flow File Repo – Write Ahead Log Content Repo Add more partitions Input/Output Streams Copy on Write Pass by Reference Allow tradeoffs of latency vs throughput 18   How fast is it and why?
  19. 19. - User to System and System to System -  Authentication (2-Way SSL, more coming…) -  Authorization (pluggable) -  Authorize a specific piece of data to a specific system -  Data provenance -  Prove you have done the right thing -  Recover when you have not 19   How does it deal with security?
  20. 20. Web UI Push API Reporting Tasks (ganglia, graphite, etc…) Pull API REST API 20   How can I monitor this at runtime?
  21. 21. Flow File Processors Advanced UI Flow File Prioritizer Reporting Tasks Controller Services Build Clients against our REST API 21   What are the points of extension?
  22. 22. Status and direction for NiFi 22   Efficient use of each node -  100s of MB/s per node -  100Ks transactions/s per node Simple / Effective scaling model Runtime Command and Control Data Provenance   Distributed durability of data - Maybe Kafka backed queues High Availability Cluster Manager Live / Rolling Upgrades Provenance Query Language / Reporting A complete user experience enabled by provenance Existing Strengths Roadmap Highlights
  23. 23. Apache NiFi (incubating) site http://nifi.incubator.apache.org Subscribe to and collaborate at dev@nifi.incubator.apache.org users@nifi.incubator.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI @apachenifi   23   Learn more about Apache NiFi

×