Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Running Apache NiFi with Apache Spark : Integration Options

Download to read offline

A walk-through of various options in integration Apache Spark and Apache NiFi in one smooth dataflow. There are now several options in interfacing between Apache NiFi and Apache Spark with Apache Kafka and Apache Livy.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Running Apache NiFi with Apache Spark : Integration Options

  1. 1. 1 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache NiFi Integration with Apache Spark Timothy Spann, Solutions Engineer
  2. 2. 2 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Disclaimer à This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. à Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all effect timing and final delivery. à This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Hortonworks to deliver these features in any generally available product. à Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. à Since this document contains an outline of general product development plans, customers should not rely upon it when making a purchase decision.
  3. 3. 3 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Integration Options § Apache Spark Integration via Kafka and Spark Streaming (1.6+) § Apache Spark Integration via Kafka and Spark Structured Streaming (2.2+) § Apache Spark Integration via Apache Livy
  4. 4. 4 Apache Kafka and Apache NiFi Integration +
  5. 5. 5 © Hortonworks Inc. 2011 – 2018 All Rights Reserved NiFi and Kafka Are Complementary NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow Kafka Provide durable stream store • Low latency • Distributed data durability • Decentralized management of producers & consumers +
  6. 6. 6 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Integrated Provisioning and Security Kafka 1.0 Support To enhance data governance and lineage, users can now manage access control policies using resource or tag-based security in Ranger for Kafka 1.0 clusters. Users can now install, configure, manage, upgrade, monitor, and secure Kafka 1.0 clusters with Ambari. New processors in NiFi and Streaming Analytics Manager support Kafka 1.0 features including message headers and transactions.
  7. 7. 7 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache NiFi and Kafka 1.0 – Use Case for Kafka Message Headers
  8. 8. 8 Apache Spark – Apache Kafka – Apache NiFi Architecture
  9. 9. 9 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Join Architecture Example Acquire/Move Routing & Filtering Parse Analyze Model Topic 1 Topic 2 AggregateCorrolate Pattern Matching JSON Data AVRO Data Windowing Aggregations Spark Processing Flow Management Stream Analysis ++
  10. 10. 10 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Stream Processing Streaming Analytics Manager Machine Learning Distributed queue Buffering Process decoupling Structured Streaming with SQL Orchestration Queueing Simple Event Processing Data Definition Between Environments Schema Versioning
  11. 11. 11 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Key Integration Points – NiFi & Kafka NiFi MiNiFi MiNiFi MiNiFi Kafka Consumer 1 Consumer 2 Consumer N • Producer Processors (Main) • PublishKafka_0_11 (0.10 Kafka Client) • PublishKafka_1_0 (1.0 Kafka Client) • PublishKafkaRecord_0_11 (0.11 Kafka Client) • PublishKafkaRecord_1_0 (1.0 Kafka Client) +
  12. 12. 12 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Key Integration Points – NiFi & Kafka Kafka Producer 1 Producer 2 Producer N NiFi Destination 1 Destination 2 Destination 3 • Consumer Processors (Main) • ConsumeKafka_0_11 (0.11 Kafka Client) • ConsumeKafka_1_0 (1.0 Kafka Client) • ConsumeKafkaRecord_0_11 (0.11 Kafka Client) • ConsumeKafkaRecord_1_0 (1.0 Kafka Client) +
  13. 13. 13 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Better Together NiFiMiNiFi Kafka Spark Incoming Topic Results Topic PublishKafka ConsumeKafka Destinations MiNiFi • MiNiFi – Collection, filtering, and prioritization at the edge • NiFi - Central data flow management, routing, enriching, and transformation • Kafka - Central messaging bus for subscription by downstream consumers • Spark - Streaming analytics focused on complex event processing + +SR
  14. 14. 14 © Hortonworks Inc. 2011 – 2018 All Rights Reserved NiFi PublishKafkaRecord_1_0 Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 PublishKafka Apache NiFi – Node 2 PublishKafka = Concurrent Task • Each NiFi node runs an instance of PublishKafkaRecord_1_0 • Each instance has one or more concurrent tasks (threads) • Each concurrent task is an independent producer, sends data round-robin to partitions of a topic • Records with Schemas for Performance +
  15. 15. 15 Apache Spark Streaming – Apache Kafka – Apache NiFi Architecture
  16. 16. 16 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Spark Streaming à Spark Streaming is an extension of Spark-core API that supports scalable, high throughput and fault-tolerant streaming applications. à Data can be ingested from various data sources like Kafka, Flume, Twitter, ZeroMQ or TCP sockets à Data is processed using the now-familiar API: map, filter, reduce, join and window à Processed data can be stored in databases, filesystems, or live dashboards
  17. 17. 17 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache Spark Streaming Integration via Kafka https://community.hortonworks.com/content/kbentry/173818/hdp-264-hdf-31-apache-spark-streaming-integration.html
  18. 18. 18 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache Spark Streaming Integration via Kafka
  19. 19. 19 Apache Spark Structured Streaming – Apache Kafka – Apache NiFi Architecture
  20. 20. 20 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache Spark Structured Streaming Integration via Kafka https://community.hortonworks.com/articles/91379/spark-structured-streaming-with-nifi-and-kafka-usi.html https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-sql-streaming-KafkaSource.html https://community.hortonworks.com/content/kbentry/174105/hdp-264-hdf-31-apache-spark-structured-streaming- i.html val records = spark. readStream. format("kafka"). option("subscribe", "smartPlug2"). option("kafka.bootstrap.servers", "mykafkabroker:6667").load
  21. 21. 21 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache NiFi – Apache Kafka – Apache Spark
  22. 22. 22 Apache Spark – Apache Livy
  23. 23. 23 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Introducing Apache Livy à Apache Livy is the open source REST interface for interacting with Apache Spark from anywhere à Installed as Spark2 Ambari Service Livy Client HTTP HTTP (RPC) Spark Interactive Session SparkContext Spark Batch Session SparkContext Livy Server https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component- guide/content/ch_submit-spark-apps-livy.html
  24. 24. 24 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Livy Server as a Session Management Service Livy Server Remote Spark Driver Session Remote Context Interactive REST API Batch REST API Standard Spark Batch Job Spark Executor Spark Executor Spark Executor Spark Executor https://livy.incubator.apache.org/docs/latest/rest-api.html
  25. 25. 25 Apache Spark – Apache Livy – Apache NiFi Integration
  26. 26. 26 © Hortonworks Inc. 2011 – 2018 All Rights Reserved SQL Architecture Example Routing & Filtering Parse Analyze Session 1 Session 1 AggregateSQL JSON Data Spark Processing Flow Management Analytics
  27. 27. 27 © Hortonworks Inc. 2011 – 2018 All Rights Reserved NiFi to Spark Processing Streaming Analytics Manager Machine Learning REST API Enterprise Tested Secure Structured Streaming with SQL Orchestration Queueing Simple Event Processing Data Definition Between Environments Schema Versioning
  28. 28. 28 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Key Integration Points – NiFi & Spark NiFi MiNiFi MiNiFi MiNiFi Livy Spark Spark 2 Spark N • Processor and Controller • ExecuteSparkInteractive – setup job and code to Livy Session Service • LivySessionService – manages Spark Livy connection pool + +
  29. 29. 29 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Better Together NiFiMiNiFi Livy Spark Session Batch ExecuteSpark Interactive MiNiFi • MiNiFi – Collection, filtering, and prioritization at the edge • NiFi - Central data flow management, routing, enriching, and transformation • Livy – Secure HTTPS connection to running Spark batch and sessions jobs with cached RDD sharing and a live Spark context. • Spark - Streaming analytics focused on complex event processing + + LivySessionService
  30. 30. 30 Apache Spark – Apache Livy – Apache NiFi Architecture
  31. 31. 31 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache Spark Integration via Apache Livy
  32. 32. 32 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Apache Spark Integration via Apache Livy https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte.html https://community.hortonworks.com/articles/171893/hdf-31-executing-apache-spark-via-executesparkinte-1.html
  33. 33. 33 © Hortonworks Inc. 2011 – 2018 All Rights Reserved
  34. 34. 34 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/
  35. 35. 35 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Contact https://community.hortonworks.com/users/9304/tspann.html https://dzone.com/users/297029/bunkertor.html https://www.meetup.com/futureofdata-princeton/ https://twitter.com/PaaSDev https://community.hortonworks.com/articles/174105/hdp-264-hdf-31-apache-spark-structured-streaming-i.html
  36. 36. 36 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Hortonworks Community Connection Read access for everyone, join to participate and be recognized • Full Q&A Platform (like StackOverflow) • Knowledge Base Articles • Code Samples and Repositories
  37. 37. 37 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Community Engagement Participate now at: community.hortonworks.com© Hortonworks Inc. 2011 – 2015. All Rights Reserved 4,000+ Registered Users 10,000+ Answers 15,000+ Technical Assets One Website!
  38. 38. 38 © Hortonworks Inc. 2011 – 2018 All Rights Reserved Register at dataworkssummit.com #DWS18 Berlin, Germany San Jose, California APRIL 16-19, 2018 | ESTREL HOTEL JUNE 17-21, 2018 | MCENERY CONVENTION CENTER
  • RagubaranThevarasah

    Jul. 31, 2020
  • RaviHindocha2

    Jun. 10, 2020
  • igreg10

    Nov. 18, 2019
  • rrajesh1979

    Jun. 14, 2019
  • nickchervov

    Mar. 13, 2019
  • MochamadRamadhan4

    Jan. 23, 2019
  • UmapathyV

    Jan. 16, 2019
  • hlisah

    Jul. 29, 2018
  • MuhammadBilalShabbir

    Jun. 20, 2018
  • flei_98

    Jun. 8, 2018
  • AravindYarram

    May. 28, 2018
  • herc

    May. 25, 2018
  • kennethowino9

    May. 11, 2018
  • ViniciusBritoRocha

    Apr. 2, 2018
  • TarunReddy14

    Mar. 13, 2018
  • StreamingAnalytics

    Mar. 6, 2018
  • parkman328

    Mar. 2, 2018
  • bunkertor

    Mar. 2, 2018

A walk-through of various options in integration Apache Spark and Apache NiFi in one smooth dataflow. There are now several options in interfacing between Apache NiFi and Apache Spark with Apache Kafka and Apache Livy.

Views

Total views

5,551

On Slideshare

0

From embeds

0

Number of embeds

6

Actions

Downloads

200

Shares

0

Comments

0

Likes

18

×