Bio: Dopo un dottorato in Ingegneria Informatica presso il Politecnico di Milano ed un periodo di ricerca presso la University of California (Berkeley, CA), Matteo è responsabile tecnico di Bottega52, azienda di sviluppo software in ambito Cloud, Big Data ed Industrial Internet of Things. Da sempre appassionato di tecnologia e design patterns, ha un background che spazia dallo sviluppo kernel ai sistemi distribuiti, avvicinandosi di recente alla computazione streaming.
Abstract: Il talk racconta come Kafka ha migliorato la qualità dei sistemi progettati e realizzati da Bottega52 per i suoi clienti, portando come esempio un caso industriale di successo: un sistema di tracciamento basato su "watermark" commissionato da una multinazionale italiana di prodotti alimentari. In particolare, si presenterà l'evoluzione del sistema, nato come un piccolo "monolite" ed evoluto grazie a Kafka in un'architettura a servizi con maggiore affidabilità ed efficienza, secondo il "Command Query Responsibility Segregation" pattern.
Meetup: https://www.meetup.com/Milano-Kafka-meetup/events/244352352/
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
[1st Italian Kafka Meetup 2017] Scan and go with the flow: how I met Kafka
1. www.bottega52.it
Cloud, IoT and Big Data
Systems Engineering
Bottega52 SRL - P.IVA: 08848340967 | Piazza della Vittoria 47, 26900 Lodi (LO), Italy | www.bottega52.it | info@bottega52.it | Phone: +39 02 4003 0539
Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Bottega52 SRL - P.IVA: 08848340967 | Piazza della Vittoria 47, 26900 Lodi (LO), Italy | www.bottega52.it | info@bottega52.it | Phone: +39 02 4003 0539
Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Scan and go with the flow
How I met Ka+a
Ma0eo Ferroni
ma#eo@bo#ega52.it
1
2. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
About: Matteo Ferroni
Me
Passionate coder, software architect, (Web) surfer and
musician, local organizer of this meetup
Education
Ph.D., Politecnico di Milano
Visiting Researcher, University of California, Berkeley (UCB)
Work
Teaching, Politecnico di Milano & LIUC
CTO & Co-Founder, Bottega52 Srl
matteo@bottega52.it
@mattferroni
2
3. Internet Of Things
Cloud Platforms
Big-Data
OUR CORE
Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Industry 4.0
33
Who we are
Bo#ega52 SRL is a company
providing so;ware for Cloud,
IoT and Big Data Systems.
4. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
About: Bottega52
Who
~12 nerds, mostly Ph.D. and M.S. @ Politecnico di Milano
What
Connect stuff, collect data, build real-time systems, create value
When
Founded in Nov’2014
Where
PoliHub, Milan (Italy)
Why
High quality systems by high quality people
4
5. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
How we met Kafka
“A digital watermark is a kind of marker covertly embedded in a noise-
tolerant signal such as an audio, video or image data.” (Wikipedia)
Photo credits: https://artlawjournal.com/invisible-watermark/
Case study: digital watermarks scan & analytics
5
6. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Case study: digital watermarks scan & analytics
Consumer
• Updated contents and promotions
• Full and updated product information
• Original Product Certification
Retail
• In-store engagement
• In-store Instant Promotion
• Faster Checkout (the code is repeated multiple
times over package)
Production
• Production control
• Quality control
• Logistics & tracking
6
MOBILE
APP
7. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
It’s just a PoC…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Scan&Collect
data
REST API
(Java)
DB
(MySQL)
7
MOBILE
APP
8. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
It’s just a PoC…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Scan&Collect
data
Watermarks
CRUD
Basic
Statistics
REST API
(Java)
DB
(MySQL)
8
MOBILE
APP
9. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
It’s just a PoC…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Scan&Collect
data
Watermarks
CRUD
Enrich Data
(GMaps, ext.)
Basic
Statistics
REST API
(Java)
DB
(MySQL)
9
MOBILE
APP
10. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
It’s just a PoC…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
DB
(MySQL)
Scan&Collect
data
Watermarks
CRUD
Multi-Users
(administration)
Enrich Data
(GMaps, ext.)
Basic
Statistics
REST API
(Java)
10
MOBILE
APP
11. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
It’s just a PoC…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
DB
(MySQL)
Scan&Collect
data
Watermarks
CRUD
Multi-Users
(administration)
Multi-Customers
(administration)
Enrich Data
(GMaps, ext.)
Basic
Statistics
Real-Time
Analytics
REST API
(Java)
11
MOBILE
APP
12. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
REST API
(Java)
REST API
(Java)
REST API
(Java)
It’s just a PoC…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
REST API
(Java)
DB
(MySQL)
Scan&Collect
data
Watermarks
CRUD
Multi-Users
(administration)
Multi-Customers
(administration)
Custom Batch
Reports
Enrich Data
(GMaps, ext.)
Basic
Statistics
Real-Time
AnalyticsFault-Tolerancy
Scalability
Real-Time
Alerts
12
MOBILE
APP
13. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Evolvability, Reliability, Scalability
Goals:
• do not lose any scan
• meet new requirements, evolve my code fast
• design to scale
bug prone
custom logic,
bad design
…microservices,
anyone?
(Martin Fowler)
“A Microservices architecture as a service-oriented architecture composed of loosely coupled
elements that have bounded contexts.” – Adrian Cockcroft, Cloud Architect at Netflix
“[...] a suite of small services, each running in its own process and communicating with lightweight
mechanisms, often an HTTP resource API.” – Martin Fowler, Chief Scientist at ThoughtWorks
13
14. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break the PoC into pieces…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
REST API
(Java)
DB
(MySQL)
Scan&Collect
data
Watermarks
CRUD
Multi-Users
(administration)
Multi-Customers
(administration)
Custom Batch
Reports
Enrich Data
(GMaps, ext.)
Basic
Statistics
Real-Time
Analytics
Real-Time
Alerts
14
MOBILE
APP
15. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break the PoC into pieces…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
DB
(MySQL)
Real-Time
Alerts
Custom Batch
Reports
Basic
StatisticsEnrich Data
(GMaps, ext.)
Multi-Users
(administration)
Multi-Customers
(administration)
Real-Time
Analytics
Watermarks
CRUD
Scan&Collect
data
15
MOBILE
APP
16. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break data into pieces…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Real-Time
Alerts
Custom Batch
Reports
Basic
StatisticsEnrich Data
(GMaps, ext.)
Multi-Users
(administration)
Multi-Customers
(administration)
Real-Time
Analytics
Watermarks
CRUD
Scan&Collect
data
Fault-Tolerancy
Service
Discovery
Deployment
Data consistency,
modeling
and evolution
[complexity]
Scalability
16
MOBILE
APP
17. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces…
Domain-driven considerations:
• scan logic should be reliable and scalable
• backoffice logic can be slower, under maintenance
but eventually consistent
…CQRS and
Event Collaboration,
anyone? (Martin Fowler)
“CQRS stands for Command Query Responsibility Segregation. […] At its heart is the notion that you can
use a different model to update information than the model you use to read information. […]
CQRS fits well with event-based programming models. […] Having separate models raises questions about
how hard to keep those models consistent, which raises the likelihood of using eventual consistency. […]
CQRS allows you to separate the load from reads and writes allowing you to scale each independently.”
write
read
– Martin Fowler, Chief Scientist at ThoughtWorks
17
18. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Scan&Collect
data
Enrich Data
(GMaps, ext.)
Real-Time
Alerts
Real-Time
Analytics
Custom Batch
Reports
Watermarks
CRUD
Multi-Users
(administration)
Multi-Customers
(administration)
18
MOBILE
APP
19. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Scan&Collect
data
Real-Time
Alerts
Enrich Data
(GMaps, ext.)
Watermarks
CRUD
Multi-Users
(administration)
Multi-Customers
(administration)
…events streams…
Real-Time
Analytics
Custom Batch
Reports
19
MOBILE
APP
20. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces: how I met Kafka
Enrich Data
(GMaps, ext.)
Real-Time
Alerts
Scan&Collect
data
Real-Time
Analytics
Custom Batch
Reports
20
21. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces: how I met Kafka
Scan&Collect
data
Enrich Data
(GMaps, ext.)
Custom Batch
Reports
Real-Time
Analytics
Real-Time
Alerts
21
22. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces: how I met Kafka
Enrich Data
(GMaps, ext.)
Custom Batch
Reports
Real-Time
Analytics
Real-Time
Alerts
Scan&Collect
data
…
elasticsearch
Amazon RDS
22
23. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Kafka Connect
Benefits over ‘do-it-yourself’ Producers and Consumers:
• Off-the-shelf, tested Connectors for common data sources are available
• Features fault tolerance and automatic load balancing when running in distributed
mode
• No coding required, just write configuration files for Kafka Connect
• Pluggable/extendable by developers
23
24. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
JDBC
configuration
example
(standalone)
CENSORED CENSORED CENSORED CENSORED CENSORED CENSORED
Mode for detecting DB changes:
incrementing, timestamp,
timestamp+incrementing
24
25. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
• Exactly Once Delivery: the connector relies on Elasticsearch’s idempotent write semantics
to ensure exactly once delivery to Elasticsearch. When the keys are not included, or are
explicitly ignored, the connector will use topic+partition+offset as the key
• Mapping Inference: The connector can infer mappings from the Kafka Connect schemas.
If more customizations are needed (e.g. geo_point), we highly recommend to manually
create mappings.
Elasticsearch configuration example (standalone)
25
26. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Break flow into pieces: how I met Kafka
Enrich Data
(GMaps, ext.)
Custom Batch
Reports
Real-Time
Analytics
Real-Time
Alerts
Scan&Collect
data
…
Amazon RDS
elasticsearch
26
27. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Kafka Streams
Kafka Streams API is a lightweight Java library for building distributed stream processing
applications in Kafka clusters
• Easy to embed in your own applications
• Supports windowing operations, and stateful processing including distributed joins
and aggregation
• Has fault-tolerance and supports distributed processing
• Includes both a Domain-Specific Language (DSL) and a low-level API
27
28. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Elasticsearch
Production
Kafka
Connect ES
Production
Break flow for this demo
Enriched
production
topic
Amazon RDS
Kafka
Connect
JDBC
KafkaKafka
Streams
Enrich Data
(GMaps, ext.)
Scan
topic
Watermarks
topic
28
29. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Elasticsearch
Demo
Kafka
Connect
ES Demo
Elasticsearch
Production
Kafka
Connect ES
Production
Break flow for this demo
Enriched
demo topic
Enriched
production
topic
Amazon RDS
Kafka
Connect
JDBC
Custom Meetup
Processor
KafkaKafka
Streams
Enrich Data
(GMaps, ext.)
Scan
topic
Watermarks
topic
28
33. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
REST API
(Java)
REST API
(Java)
REST API
(Java)
Conclusion: from this…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
REST API
(Java)
DB
(MySQL)
Scan&Collect
data
Watermarks
CRUD
Multi-Users
(administration)
Multi-Customers
(administration)
Custom Batch
Reports
Enrich Data
(GMaps, ext.)
Basic
Statistics
Real-Time
AnalyticsFault-Tolerancy
Scalability
Real-Time
Alerts
32
MOBILE
APP
34. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
…to this…
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Real-Time
Alerts
Custom Batch
Reports
Basic
StatisticsEnrich Data
(GMaps, ext.)
Multi-Users
(administration)
Multi-Customers
(administration)
Real-Time
Analytics
Watermarks
CRUD
Scan&Collect
data
Fault-Tolerancy
Service
Discovery
Deployment
Data modeling
and evolution
[complexity]
Scalability
33
MOBILE
APP
35. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
…and finally, this!
Enrich Data
(GMaps, ext.)
Custom Batch
Reports
Real-Time
Analytics
Real-Time
Alerts
Scan&Collect
data
…
Amazon RDS
elasticsearch
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
34
36. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Scan&Collect
data
…
Amazon RDS
Working on…
Enrich Data
(GMaps, ext.)
Custom Batch
Reports
Real-Time
Analytics
Real-Time
Alerts
elasticsearch
…KSQL?
…Kafka
Streams?
“It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.)
Ka=a REST
Proxy
35
37. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Do I really need Kafka?
…Big data? Not yet… [thousands of scan/day]
…Fast data? Not really… [eventual consistency, seconds]
…Do I really need the power of Kafka in this project?
Probably no, but it allowed me to have:
• Evolvability (change requests are coming from customers)
• Reliability (thanks to Connect, Streams and KSQL in the future)
• Scalable (thanks to Kafka)
• Elegant data pipelines (of course, I can do almost everything as a LAMP stack)
36
39. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
Event-streams, data pipelines, everywhere!
The Log
38
40. Copyright 2017 @ Bottega52 SRL - www.bottega52.it
My personal takeaways
In my opinion:
• Kafka is NOT ONLY for Big/Fast Data
• Kafka is NOT ONLY for Stream Computation
• Kafka enables new architectural patterns
• Everyone can talk at this meetup, no matter how big (data) he is :)
In the next episodes…
39
LOCKS