Ramon Marquez, Confluent, Solutions Engineer
Veremos lo que realmente es Apache Kafka, la solución de facto del streaming de datos, utilizada por empresas de la talla de Spotify, Netflix, Linkedin entre otras.
https://www.meetup.com/Mexico-Kafka/events/275757361/
5. New World Streaming first
• DB/DWH + Many more
distributed data systems
• Monolith -> Microservices
• Batch -> Real-time
6. Origins in Stream Processing
Serving Layer
(Microservices,
Elastic, etc.)
Java Apps with Kafka
Streams or ksqlDB
Continuous
Computation
High-Throughput
Event Streaming
Platform
API-Based
Clustering
7. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Apache Kafka is a Distributed Event
Streaming Platform
Process streams of events In real time, as they occur
110101
010111
001101
100010
Publish and subscribe to
streams of events
Similar to a message queue or
enterprise messaging system
110101
010111
001101
100010
Store streams of events In a fault tolerant way
110101
010111
001101
100010
7
13. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Partitions
1 2 3 4 5 6 8 9
7
Partition 1
Old New
1 2 3 4 5 6 8
7
Partition 0 10
9 11 12
Partition 2 1 2 3 4 5 6 8
7 10
9 11 12
Writes
Messages are guaranteed to be
strictly ordered within a partition
14. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Apache Kafka is a Distributed Event
Streaming Platform
Process streams of events In real time, as they occur
110101
010111
001101
100010
Publish and subscribe to
streams of events
Similar to a message queue or
enterprise messaging system
110101
010111
001101
100010
Store streams of events In a fault tolerant way
110101
010111
001101
100010
14
16. Consuming data - access is only sequential
Old New
Read to offset & scan
17. Consumers have a position of their own
Old New
Sally
is here
Scan
18. Consumers have a position of their own
Old New
Sally
is here
Scan
Fred
is here
Scan
19. Consumers have a position of their own
Old New
Sally
is here
Scan
Fred
is here
Scan
Rick
is here
Scan
20. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer - Messages With No Key
partition 0
partition 1
partition 2
partition 3
Messages go to
partitions on round
robin basis
Strictly speaking - depends
on partitioner, but this is the
default behaviour
21. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer - Messages With Key
A B A C
Messages go to
partitions based on a
hash of the key
Strictly speaking - depends
on partitioner, but this is the
default behaviour
partition 0
partition 1
partition 2
partition 3
22. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
partition 0
partition 1
partition 2
partition 3
Consumers
operate
independently
Can set start by
offset or by
timestamp
Consumer
1
Consumer
2
Consuming From Kafka - Consumers
23. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
C
C
Consumer 1
Group 1
C
C
Consumer 1
Group2
partition 0
partition 1
partition 2
partition 3
Instances can
share workload
in a consumer
group
Partition
assignment
can be round
robin or explicit
Consuming From Kafka - Grouped Consumers
35. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Producer Guarantees
Producer
1
Broker 1 Broker 2 Broker 3
Topic1
partition1
Leader Follower
Topic1
partition1
Topic1
partition1
acks=0
Producer
2
Topic2
partition1
Topic2
partition1
Topic2
partition1
acks=1
Producer
3
Topic2
partition1
Topic2
partition1
Topic2
partition1
acks=all
min.insync.replicas=2
ack
ack
Exactly once is
also supported
36. #kafkaLATAM | @confluentinc
Like a MQ (ActiveMQ, Rabbit, etc.) but with:
• Far better scalability
• Built-in fault tolerance/HA
• Storage
The log is a type of durable messaging
system
37. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Apache Kafka is a Distributed Event
Streaming Platform
Process streams of events In real time, as they occur
110101
010111
001101
100010
Publish and subscribe to
streams of events
Similar to a message queue or
enterprise messaging system
110101
010111
001101
100010
Store streams of events In a fault tolerant way
110101
010111
001101
100010
37
40. CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is stream processing?
authorization_attempts possible_fraud
41. CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is stream processing?
authorization_attempts possible_fraud
42. CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is stream processing?
authorization_attempts possible_fraud
43. CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is stream processing?
authorization_attempts possible_fraud
44. CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is stream processing?
authorization_attempts possible_fraud
45. CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 MINUTE)
GROUP BY card_number
HAVING count(*) > 3;
What exactly is stream processing?
authorization_attempts possible_fraud
46. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Kafka Streams: Write standard Java apps and
microservices to process your data in real-time
• No separate processing cluster required
• Develop on Mac, Linux, Windows
• Deploy to containers, VMs, bare metal, cloud
• Powered by Kafka: elastic, scalable, distributed,
battle-tested
• Perfect for small, medium, large use cases
• Fully integrated with Kafka security
• Exactly-once processing semantics
• Part of Apache Kafka
KStream<User, PageViewEvent> pageViews = builder.stream("pageviews-topic");
KTable<Windowed<User>, Long> viewsPerUserSession = pageViews
.groupByKey()
.count(SessionWindows.with(TimeUnit.MINUTES.toMillis(5)), "session-views");
https://docs.confluent.io/current/streams/
46
47. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Use one, lightweight SQL
syntax to build a complete
real-time application
CREATE STREAM payments(user VARCHAR,
payment_amount INT)
WITH (kafka_topic = ’all_payments’,
key = ’user’,
value_format = ’avro’);
Create aggregations of
event data that can serve
queries to applications
Enrich Kafka data with a
robust stream processing
framework
USER Payment
Jay $10
Sue $15
Fred $5
... ...
ksqlDB
Easily Build Event Streaming Applications
USER Credit Score
Jay 660
Sue 710
Fred 595
USER Credit Score
Jay 660
Sue 710
Fred 595
USER Credit Score
Jay 660
Sue 710
Fred 595
48. Copyright 2020, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
ksqlDB
Lowering the Bar to Enter the World of Streaming
Kafka User Population
Coding
Sophistication
Core Java developers
Core developers who don’t use
Java/Scala
Data engineers, architects,
DevOps/SRE
streams
BI analysts
ksqlDB
50. ksqlDB provides everything you need to build
a complete, end-to-end event streaming
application entirely with SQL syntax
DB
APP
APP
DB
PULL
PUSH
CONNECTORS
STREAM
PROCESSING
STATE STORES
ksqlDB
1 2
APP
Simplify
Your Stream
Processing
Architecture
60. #kafkaLATAM | @confluentinc
ALL UPCOMING MEETUPS
NEW EVENT EMAIL ALERTS
THE CONFLUENT MEETUP HUB
CNFL.IO/MEETUP-HUB
VIDEOS OF PAST
MEETUPS
SLIDES FROM THE
TALKS
61. #kafkaLATAM | @confluentinc
Confluent Community
Slack
A vibrant community of over 16,000
members
Come along and discuss Apache Kafka and
Confluent Platform on dedicated channels
including #ksqlDB, #connect, #clients, and more
http://cnfl.io/slack
62. #kafkaLATAM | @confluentinc
Free eBooks Designing Event-Driven Systems
Ben Stopford
Kafka: The Definitive Guide
Neha Narkhede, Gwen Shapira, Todd Palino
Making Sense of Stream Processing
Martin Kleppmann
I ❤ Logs
Jay Kreps
http://cnfl.io/book-bundle