Successfully reported this slideshow.
Your SlideShare is downloading. ×

Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 30 Ad

Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo

Download to read offline

In this talk we will walk through how Apache Kafka and Apache Accumulo can be used together to orchestrate a de-coupled, real-time distributed and reactive request/response system at massive scale. Multiple data pipelines can perform complex operations for each message in parallel at high volumes with low latencies. The final result will be inline with the initiating call. The architecture gains are immense. They allow for the requesting system to receive a response without the need for direct integration with the data pipeline(s) that messages must go through. By utilizing Apache Kafka and Apache Accumulo, these gains sustain at scale and allow for complex operations of different messages to be applied to each response in real-time.

In this talk we will walk through how Apache Kafka and Apache Accumulo can be used together to orchestrate a de-coupled, real-time distributed and reactive request/response system at massive scale. Multiple data pipelines can perform complex operations for each message in parallel at high volumes with low latencies. The final result will be inline with the initiating call. The architecture gains are immense. They allow for the requesting system to receive a response without the need for direct integration with the data pipeline(s) that messages must go through. By utilizing Apache Kafka and Apache Accumulo, these gains sustain at scale and allow for complex operations of different messages to be applied to each response in real-time.

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Viewers also liked (20)

Advertisement

Similar to Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo (20)

Advertisement

Recently uploaded (20)

Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo

  1. 1. Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
  2. 2. Joe Stein • Developer, Architect & Technologist • Founder & Principal Consultant => Big Data Open Source Security LLC - http://stealth.ly Big Data Open Source Security LLC provides professional services and product solutions for the collection, storage, transfer, real-time analytics, batch processing and reporting for complex data streams, data sets and distributed systems. BDOSS is all about the "glue" and helping companies to not only figure out what Big Data Infrastructure Components to use but also how to change their existing (or build new) systems to work with them. • CEO => Elodina, Inc. Expanding BDOSS from just consulting, Elodina is an ISV & SaaS provider of stream solutions & open source software. Elodina helps make data streams actionable. • Apache Kafka Committer & PMC member • Blog & Podcast - http://allthingshadoop.com • Twitter @allthingshadoop
  3. 3. Overview ● Real-time distributed reactive systems ● Quick Intro to Apache Kafka ● Quick Intro to Apache Mesos ● Kafka on Mesos ● Accumulo & HDFS on Mesos ● Real-time distributed reactive systems ● Bringing it all together with Accumulo
  4. 4. Real-Time Distributed and Reactive Systems A distributed system for asynchronous stream processing with non-blocking back pressure where complex event processing systems can influence the response without coupling the business logic of processing. The response can be calculated by parallel operations with concurrent orthogonal processing engines computing their influence towards the final result.
  5. 5. Real-Time Distributed and Reactive Systems
  6. 6. http://kafka.apache.org
  7. 7. Apache Kafka • Apache Kafka o http://kafka.apache.org • Apache Kafka Source Code o https://github.com/apache/kafka • Documentation o http://kafka.apache.org/documentation.html • Wiki o https://cwiki.apache.org/confluence/display/KAFKA/Index
  8. 8. Producers, Consumers, Brokers • Producers - ** push ** o Batching o Compression o Sync (Ack), Async (auto batch) o Replication o Sequential writes, guaranteed ordering within each partition • Consumers - ** pull ** o No state held by broker o Consumers control reading from the stream • Zero Copy for producers and consumers to and from the broker http://kafka.apache.org/documentation.html#maximizingefficiency • Message stay on disk when consumed, deletes on TTL or compaction https://kafka.apache.org/documentation.html#compaction
  9. 9. Kafka decouples data-pipelines
  10. 10. Client Libraries Community Clients https://cwiki.apache.org/confluence/display/KAFKA/Clients • Python - Pure Python implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • C - High performance C library with full protocol support • C++ - Native C++ library with protocol support for Metadata, Produce, Fetch, and Offset. • Go (aka golang) Pure Go implementation with full protocol support. Consumer and Producer implementations included, GZIP and Snappy compression supported. • Ruby - Pure Ruby, Consumer and Producer implementations included, GZIP and Snappy compression supported. Ruby 1.9.3 and up (CI runs MRI 2. • Clojure - Clojure DSL for the Kafka API • JavaScript (NodeJS) - NodeJS client in a pure JavaScript implementation • stdin & stdout Wire Protocol Developers Guide https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol
  11. 11. Really Quick Start (Scala) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/scala-kafka 4) cd scala-kafka 5) vagrant up Zookeeper will be running on 192.168.86.5 BrokerOne will be running on 192.168.86.10 All the tests in ./src/test/scala/* should pass (which is also /vagrant/src/test/scala/* in the vm) 6) ./gradlew test
  12. 12. Really Quick Start (Go) 1) Install Vagrant http://www.vagrantup.com/ 2) Install Virtual Box https://www.virtualbox.org/ 3) git clone https://github.com/stealthly/go-kafka 4) cd go-kafka 5) vagrant up 6) vagrant ssh brokerOne 7) cd /vagrant 8) sudo ./test.sh
  13. 13. Apache Mesos http://mesos.apache.org
  14. 14. Origins Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center http://static.usenix.org/event/nsdi11/tech/full_papers/Hindman_new.pdf Google Borg - https://research.google.com/pubs/pub43438.html Google Omega: flexible, scalable schedulers for large compute clusters http://eurosys2013.tudos.org/wp- content/uploads/2013/paper/Schwarzkopf.pdf
  15. 15. Static Partition == Idle Resources
  16. 16. Operating System === Datacenter
  17. 17. Mesos => data center “kernel”
  18. 18. Apache Mesos ● Scalability to 10,000s of nodes ● Fault-tolerant replicated master and slaves using ZooKeeper ● Support for Docker containers ● Native isolation between tasks with Linux Containers ● Multi-resource scheduling (memory, CPU, disk, and ports) ● Java, Python and C++ APIs for developing new parallel applications ● Web UI for viewing cluster state
  19. 19. Sample Frameworks C++ - https://github.com/apache/mesos/tree/master/src/examples Java - https://github.com/apache/mesos/tree/master/src/examples/java Python - https://github.com/apache/mesos/tree/master/src/examples/python Scala - https://github.com/mesosphere/scala-sbt-mesos-framework.g8 Go - https://github.com/mesosphere/mesos-go
  20. 20. Kafka on Mesos ● The Mesos Kafka framework https://github.com/mesos/kafka ○ Smart broker.id assignment. ○ Preservation of broker placement. ○ Ability to-do configuration changes. ○ Rolling restarts. ○ Auto-scaling the cluster up and down.
  21. 21. Accumulo on Mesos No framework yet, but you can use Marathon, no problem! Marathon https://github.com/mesosphere/marathon is a cluster- wide init and control system for services in cgroups or docker based on Apache Mesos HDFS on Mesos https://github.com/mesosphere/hdfs (more on this in a bit)
  22. 22. Real-Time Distributed and Reactive Systems
  23. 23. Real-Time Distributed and Reactive Systems
  24. 24. Where does Accumulo fit in? ● Iterators ○ Accumulo iterators are a real time processing framework with “reduce like” functionality ● Multi HDFS Volume Support ○ Spin up HDFS clusters when they are needed ● Streaming Large Blobs ○ Post files in producers, process and respond to scans ● More!
  25. 25. Real-Time Distributed and Reactive Systems
  26. 26. Questions? /******************************************* Joe Stein CEO, Elodina, Inc http://www.stealth.ly Twitter: @allthingshadoop ********************************************/

×