Data is being collected more and more every year. Cloud applications, including IoT, web, and mobile send torrents of bits at our data centers that have to be processed and stored. In addition, users expect an always-on experience, with little room for error. Numerous companies are successfully doing this every day. In this webinar, you will learn about the convergence of complementary technologies: Spark, Mesos, Akka, Cassandra and Kafka (SMACK), how Apache Kafka can help you get your data under control and the critical role Kafka plays in your data pipeline.
Webinar recording: https://youtu.be/uwYlwLyv-1s
Webinar Q&A will be posted shortly.
40. Kafka
Producer Consumer
Collection API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Partition 1
Temperature
Processor
Topic = Temperature
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Partition 1
Topic Temperature
Replication Factor = 2
Topic Precipitation
Replication Factor = 2
41. Kafka
Producer
Consumer
Collection API
Temperature
Processor
Precipitation
Processor
Topic = Temperature
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Partition 1 Temperature
Processor
Topic = Temperature
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Topic = Precipitation
Precip
1
Precip
2
Precip
3
Precip
4
Precip
5
Broker
Partition 0
Partition 0
Tem
p
1
Temp
2
Tem
p
3
Temp
4
Temp
5
Partition 1
Temperature
Processor
Temperature
Processor
Precipitation
Processor
Topic Temperature
Replication Factor = 2
Topic Precipitation
Replication Factor = 2
42. Guarantees
Order
•Messages are ordered as they are sent by the
producer
•Consumers see messages in the order they were
inserted by the producer
Durability
•Messages are delivered at least once
•With a Replication Factor N up to N-1 server failures
can be tolerated without losing committed messages
44. Coming soon!
• May 4: How to Achieve High Throughput for Real-Time Applications with SMACK,
Apache Kafka and Spark Streaming
• May 18: How to Build Data Pipelines with SMACK: Storage Strategy using
Cassandra and DSE
• June 1: How to Build Data Pipelines with SMACK: Analyzing Data with Spark
• For the latest schedule of webinars, check out our Webinars
page: http://www.datastax.com/resources/webinars
45. Go get your SMACK on
Thank you!
Follow me on twitter: @PatrickMcFadin
Editor's Notes
Dealing with data can be hard
Too many choices
We can wind up building some pretty messed up stuff
Data pipelines consisting of many parts
Kafka to organize
Akka and Spark to process
Cassandra to Store
Mesos to manage everything
We are building our app with these principles
Some things need to be processed immediately
Some in a batch after we have the entire picture
This app exists. it’s called KillrWeather
One order, no problem
Multiple order. Chaos
No set format
Restaurant
Restaurant
Restaurant
Restaurant
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Waiters provide order
Data pipelines consisting of many parts
Kafka to organize
Akka and Spark to process
Cassandra to Store
Mesos to manage everything