What Is Apache Bahir ?
● Provides extensions for Apache Spark and Apache Flink
● Open source / Apache 2.0 license
● Streaming connectors and SQL data sources
● One grouped location for extensions
● Initiated in 2016 from Spark project
● A source for current and future extensions
Apache Bahir Flink Extensions
● Streaming Connectors
– ActiveMQ connector
– Akka connector
– Flume connector
– InfluxDB connector
– Kudu connector
– Netty connector
– Redis connector
Apache Bahir Spark Extensions
● SQL Data Sources
– Apache CouchDB/Cloudant data source
● Structured Streaming Data Sources
– Akka data source
– MQTT data source (new Sink)
Apache Bahir Spark Extensions
● Discretized Streams (DStreams) Connectors
– Apache CouchDB/Cloudant connector
– Akka connector
– Google Cloud Pub/Sub connector
– Cloud PubNub connector
– MQTT connector
– Twitter connector
– ZeroMQ connector (Enhanced Implementation)
Apache Bahir Importance
● Seems like a small project ? But it covers
– Multiple Spark extensions
– Multiple Flink extensions
– Possible future extensions
● Why is it important ?
– Knowledge of this project …
– Aids reuse, avoids the need to recreate connectors
– Saves money and time !
Apache Bahir Status
● OK great project but is it current ?
● Started in 2016 but is it still going ?
● Check Github
● https://github.com/apache/bahir-flink
– Last update 27/05/2020 => current
● https://github.com/apache/bahir
– Last update 20/01/2020 => current
Apache Bahir Documentation
● Flink connector documentation describes
– Dependencies
– Version compatibility
– Source and sink classes
– Linking for cluster execution
Apache Bahir Documentation
● Spark connector documentation describes
– Linking
– Configuration
– Examples
● Scala
● Java
● Python
● Taking MQTT as an example
● Documentation is comprehensive
Available Books
● See “Big Data Made Easy”
– Apress Jan 2015
●
See “Mastering Apache Spark”
– Packt Oct 2015
●
See “Complete Guide to Open Source Big Data Stack
– “Apress Jan 2018”
● Find the author on Amazon
– www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
●
Connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
Connect
● Feel free to connect on LinkedIn
– www.linkedin.com/in/mike-frampton-38563020
● See my open source blog at
– open-source-systems.blogspot.com/
● I am always interested in
– New technology
– Opportunities
– Technology based issues
– Big data integration

Apache Bahir

  • 1.
    What Is ApacheBahir ? ● Provides extensions for Apache Spark and Apache Flink ● Open source / Apache 2.0 license ● Streaming connectors and SQL data sources ● One grouped location for extensions ● Initiated in 2016 from Spark project ● A source for current and future extensions
  • 2.
    Apache Bahir FlinkExtensions ● Streaming Connectors – ActiveMQ connector – Akka connector – Flume connector – InfluxDB connector – Kudu connector – Netty connector – Redis connector
  • 3.
    Apache Bahir SparkExtensions ● SQL Data Sources – Apache CouchDB/Cloudant data source ● Structured Streaming Data Sources – Akka data source – MQTT data source (new Sink)
  • 4.
    Apache Bahir SparkExtensions ● Discretized Streams (DStreams) Connectors – Apache CouchDB/Cloudant connector – Akka connector – Google Cloud Pub/Sub connector – Cloud PubNub connector – MQTT connector – Twitter connector – ZeroMQ connector (Enhanced Implementation)
  • 5.
    Apache Bahir Importance ●Seems like a small project ? But it covers – Multiple Spark extensions – Multiple Flink extensions – Possible future extensions ● Why is it important ? – Knowledge of this project … – Aids reuse, avoids the need to recreate connectors – Saves money and time !
  • 6.
    Apache Bahir Status ●OK great project but is it current ? ● Started in 2016 but is it still going ? ● Check Github ● https://github.com/apache/bahir-flink – Last update 27/05/2020 => current ● https://github.com/apache/bahir – Last update 20/01/2020 => current
  • 7.
    Apache Bahir Documentation ●Flink connector documentation describes – Dependencies – Version compatibility – Source and sink classes – Linking for cluster execution
  • 8.
    Apache Bahir Documentation ●Spark connector documentation describes – Linking – Configuration – Examples ● Scala ● Java ● Python ● Taking MQTT as an example ● Documentation is comprehensive
  • 9.
    Available Books ● See“Big Data Made Easy” – Apress Jan 2015 ● See “Mastering Apache Spark” – Packt Oct 2015 ● See “Complete Guide to Open Source Big Data Stack – “Apress Jan 2018” ● Find the author on Amazon – www.amazon.com/Michael-Frampton/e/B00NIQDOOM/ ● Connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020
  • 10.
    Connect ● Feel freeto connect on LinkedIn – www.linkedin.com/in/mike-frampton-38563020 ● See my open source blog at – open-source-systems.blogspot.com/ ● I am always interested in – New technology – Opportunities – Technology based issues – Big data integration