Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Introduction to
Apache NiFi & Storm
Jungtaek Lim
WHO AM I?
• Staff Software Engineer @ Hortonworks
• remote worker
• Open source prosumer
• Committer of Jedis
• PMC member...
Core
InfrastructureSources
à Constrained
à High-latency
à Localized	context
à Hybrid	– cloud	/	on-premises
à Low-latency
Ã...
What is Apache NiFi?
An easy to use, powerful, and reliable system to
process and distribute data.
History of Apache NiFi
• Created by the United States National Security Agency (NSA)
• originally named Niagarafiles
• In 2014 the NSA submitted t...
Role of Apache NiFi
• Data acquisition and delivery
• Simple transformation and data routing
• Simple event processing
• End to end provenance...
NOT intended to REPLACE 

‘distribute computation engines’ 

(a.k.a streaming processing frameworks)
Features of Apache NiFi
Highly configurable
• Loss tolerant vs guaranteed delivery
• Low latency vs high throughput
• Dynamic prioritization
• Flow...
More…
• Designed for extension
• Build your own processors and more
• Secure
• SSL, SSH, HTTPS, encrypted content, etc...
...
What is Apache Storm?
A free and open source distributed realtime
computation system.
History of Apache Storm
Source: http://hortonworks.com/blog/brief-history-apache-storm/
Concepts of Apache Storm
• Spout: a source of streams in a topology
• Bolt: a processing component which includes Sink
• Stream: an unbounded seque...
Core vs Trident
Core Trident
Computation Unit Record (tuple) Micro batch
Latency Very low (sub-seconds)
High (up to batch size)
Similar to...
Features of Apache Storm
• Supports number of connectors (17 connectors in master branch)
• Automatic back-pressure
• Distributed Cache
• Flux (con...
Future of Apache Storm
Apache Storm 2.0 and beyond
• Clojure to Java translation
• Unified Stream API with supporting exactly-once
• Rework Metrics feature
• Apache Beam runn...
THANKS!
Any questions?
Appendix A.Apache NiFi
NiFi EvaluateJsonPath /
RouteOnAttribute configuration
NiFi PutHDFS / PublishKafka
configuration
NiFi Queue options – Status History
NiFi Queue options – List queue
NiFi Data Provenance
Appendix B.Apache Storm
Distributed Log Search
Dynamic Worker Profiling
Dynamic Log Levels
Topology Event Inspector
Resource Aware Scheduler
Source:	Resource	Aware	Scheduling	in	Apache	Storm,	Hadoop	Summit	San	Jose	2016
Introduction to Apache NiFi And Storm
Upcoming SlideShare
Loading in …5
×

Introduction to Apache NiFi And Storm

839 views

Published on

This slide shares an introduction to Apache NiFi and Storm. For Apache Storm the slide also talks about future of Apache Storm quickly.

Published in: Software
  • Be the first to comment

Introduction to Apache NiFi And Storm

  1. 1. Introduction to Apache NiFi & Storm Jungtaek Lim
  2. 2. WHO AM I? • Staff Software Engineer @ Hortonworks • remote worker • Open source prosumer • Committer of Jedis • PMC member of Apache Storm • Contributor of Apache (Spark, Zeppelin, Ambari, Calcite), Redis, and so on. • Contact • kabhwan@gmail.com • Twitter / LinkedIn / Github / Facebook • @heartsavior
  3. 3. Core InfrastructureSources à Constrained à High-latency à Localized context à Hybrid – cloud / on-premises à Low-latency à Global context Regional Infrastructure DATA IN MOTION IN HORTONWORKS DATAFLOW (HDF) Source: http://ko.hortonworks.com/products/data-center/hdf/
  4. 4. What is Apache NiFi?
  5. 5. An easy to use, powerful, and reliable system to process and distribute data.
  6. 6. History of Apache NiFi
  7. 7. • Created by the United States National Security Agency (NSA) • originally named Niagarafiles • In 2014 the NSA submitted the source code to Apache Software Foundation, via the NSATechnologyTransfer Program, entered incubation in December 2014 • Development of Apache NiFi continued at Onyara, Inc., a start up company • Became ApacheTop-Level Project in July 2015 • Hortonworks acquired Onyara, Inc. in August 2015
  8. 8. Role of Apache NiFi
  9. 9. • Data acquisition and delivery • Simple transformation and data routing • Simple event processing • End to end provenance • Edge intelligence and bi-directional comms.
  10. 10. NOT intended to REPLACE 
 ‘distribute computation engines’ 
 (a.k.a streaming processing frameworks)
  11. 11. Features of Apache NiFi
  12. 12. Highly configurable • Loss tolerant vs guaranteed delivery • Low latency vs high throughput • Dynamic prioritization • Flow can be modified at runtime • Back pressure
  13. 13. More… • Designed for extension • Build your own processors and more • Secure • SSL, SSH, HTTPS, encrypted content, etc... • Multi-tenant authorization and internal authorization/policy management • MiNiFi subproject • Reduce footprint to ~ 40 MB
  14. 14. What is Apache Storm?
  15. 15. A free and open source distributed realtime computation system.
  16. 16. History of Apache Storm
  17. 17. Source: http://hortonworks.com/blog/brief-history-apache-storm/
  18. 18. Concepts of Apache Storm
  19. 19. • Spout: a source of streams in a topology • Bolt: a processing component which includes Sink • Stream: an unbounded sequence of tuples, defined with schema • Stream groupings: defines how that stream should be partitioned among the bolt's tasks • Topology: the logic for a realtime application represented to a DAG
  20. 20. Core vs Trident
  21. 21. Core Trident Computation Unit Record (tuple) Micro batch Latency Very low (sub-seconds) High (up to batch size) Similar to Spark Streaming Delivery Guarantee At least once Exactly once API Compositional Declarative Stateful Operator Supported from v1.0.0 Core feature
 (exactly-once) Windowing Time (processing time, event time), Count Tumbling window, Sliding window
  22. 22. Features of Apache Storm
  23. 23. • Supports number of connectors (17 connectors in master branch) • Automatic back-pressure • Distributed Cache • Flux (constructing topology via yaml) • Distributed Log Search • Dynamic Worker Profiling • Dynamic Log Levels • Topology Event Inspector • Resource Aware Scheduler • SQL (Experimental)
  24. 24. Future of Apache Storm Apache Storm 2.0 and beyond
  25. 25. • Clojure to Java translation • Unified Stream API with supporting exactly-once • Rework Metrics feature • Apache Beam runner • Streaming SQL with Apache Calcite • And more… • Performance • Usability
  26. 26. THANKS! Any questions?
  27. 27. Appendix A.Apache NiFi
  28. 28. NiFi EvaluateJsonPath / RouteOnAttribute configuration
  29. 29. NiFi PutHDFS / PublishKafka configuration
  30. 30. NiFi Queue options – Status History
  31. 31. NiFi Queue options – List queue
  32. 32. NiFi Data Provenance
  33. 33. Appendix B.Apache Storm
  34. 34. Distributed Log Search
  35. 35. Dynamic Worker Profiling
  36. 36. Dynamic Log Levels
  37. 37. Topology Event Inspector
  38. 38. Resource Aware Scheduler Source: Resource Aware Scheduling in Apache Storm, Hadoop Summit San Jose 2016

×