Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[1st Italian Kafka Meetup 2017] Scan and go with the flow: how I met Kafka

429 views

Published on

Bio: Dopo un dottorato in Ingegneria Informatica presso il Politecnico di Milano ed un periodo di ricerca presso la University of California (Berkeley, CA), Matteo è responsabile tecnico di Bottega52, azienda di sviluppo software in ambito Cloud, Big Data ed Industrial Internet of Things. Da sempre appassionato di tecnologia e design patterns, ha un background che spazia dallo sviluppo kernel ai sistemi distribuiti, avvicinandosi di recente alla computazione streaming.

Abstract: Il talk racconta come Kafka ha migliorato la qualità dei sistemi progettati e realizzati da Bottega52 per i suoi clienti, portando come esempio un caso industriale di successo: un sistema di tracciamento basato su "watermark" commissionato da una multinazionale italiana di prodotti alimentari. In particolare, si presenterà l'evoluzione del sistema, nato come un piccolo "monolite" ed evoluto grazie a Kafka in un'architettura a servizi con maggiore affidabilità ed efficienza, secondo il "Command Query Responsibility Segregation" pattern.

Meetup: https://www.meetup.com/Milano-Kafka-meetup/events/244352352/

Published in: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[1st Italian Kafka Meetup 2017] Scan and go with the flow: how I met Kafka

  1. 1. www.bottega52.it Cloud, IoT and Big Data Systems Engineering Bottega52 SRL - P.IVA: 08848340967 | Piazza della Vittoria 47, 26900 Lodi (LO), Italy | www.bottega52.it | info@bottega52.it | Phone: +39 02 4003 0539 Copyright 2017 @ Bottega52 SRL - www.bottega52.it Bottega52 SRL - P.IVA: 08848340967 | Piazza della Vittoria 47, 26900 Lodi (LO), Italy | www.bottega52.it | info@bottega52.it | Phone: +39 02 4003 0539 Copyright 2017 @ Bottega52 SRL - www.bottega52.it Scan and go with the flow How I met Ka+a Ma0eo Ferroni ma#eo@bo#ega52.it 1
  2. 2. Copyright 2017 @ Bottega52 SRL - www.bottega52.it About: Matteo Ferroni Me Passionate coder, software architect, (Web) surfer and musician, local organizer of this meetup Education Ph.D., Politecnico di Milano Visiting Researcher, University of California, Berkeley (UCB) Work Teaching, Politecnico di Milano & LIUC CTO & Co-Founder, Bottega52 Srl matteo@bottega52.it @mattferroni 2
  3. 3. Internet Of Things Cloud Platforms Big-Data OUR CORE Copyright 2017 @ Bottega52 SRL - www.bottega52.it Industry 4.0 33 Who we are Bo#ega52 SRL is a company providing so;ware for Cloud, IoT and Big Data Systems.
  4. 4. Copyright 2017 @ Bottega52 SRL - www.bottega52.it About: Bottega52 Who ~12 nerds, mostly Ph.D. and M.S. @ Politecnico di Milano What Connect stuff, collect data, build real-time systems, create value When Founded in Nov’2014 Where PoliHub, Milan (Italy) Why High quality systems by high quality people 4
  5. 5. Copyright 2017 @ Bottega52 SRL - www.bottega52.it How we met Kafka “A digital watermark is a kind of marker covertly embedded in a noise- tolerant signal such as an audio, video or image data.” (Wikipedia) Photo credits: https://artlawjournal.com/invisible-watermark/ Case study: digital watermarks scan & analytics 5
  6. 6. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Case study: digital watermarks scan & analytics Consumer • Updated contents and promotions • Full and updated product information • Original Product Certification Retail • In-store engagement • In-store Instant Promotion • Faster Checkout (the code is repeated multiple times over package) Production • Production control • Quality control • Logistics & tracking 6 MOBILE APP
  7. 7. Copyright 2017 @ Bottega52 SRL - www.bottega52.it It’s just a PoC… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Scan&Collect data REST API (Java) DB (MySQL) 7 MOBILE APP
  8. 8. Copyright 2017 @ Bottega52 SRL - www.bottega52.it It’s just a PoC… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Scan&Collect data Watermarks CRUD Basic Statistics REST API (Java) DB (MySQL) 8 MOBILE APP
  9. 9. Copyright 2017 @ Bottega52 SRL - www.bottega52.it It’s just a PoC… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Scan&Collect data Watermarks CRUD Enrich Data (GMaps, ext.) Basic Statistics REST API (Java) DB (MySQL) 9 MOBILE APP
  10. 10. Copyright 2017 @ Bottega52 SRL - www.bottega52.it It’s just a PoC… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) DB (MySQL) Scan&Collect data Watermarks CRUD Multi-Users (administration) Enrich Data (GMaps, ext.) Basic Statistics REST API (Java) 10 MOBILE APP
  11. 11. Copyright 2017 @ Bottega52 SRL - www.bottega52.it It’s just a PoC… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) DB (MySQL) Scan&Collect data Watermarks CRUD Multi-Users (administration) Multi-Customers (administration) Enrich Data (GMaps, ext.) Basic Statistics Real-Time Analytics REST API (Java) 11 MOBILE APP
  12. 12. Copyright 2017 @ Bottega52 SRL - www.bottega52.it REST API (Java) REST API (Java) REST API (Java) It’s just a PoC… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) REST API (Java) DB (MySQL) Scan&Collect data Watermarks CRUD Multi-Users (administration) Multi-Customers (administration) Custom Batch Reports Enrich Data (GMaps, ext.) Basic Statistics Real-Time AnalyticsFault-Tolerancy Scalability Real-Time Alerts 12 MOBILE APP
  13. 13. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Evolvability, Reliability, Scalability Goals: • do not lose any scan • meet new requirements, evolve my code fast • design to scale bug prone custom logic, bad design …microservices, anyone? (Martin Fowler) “A Microservices architecture as a service-oriented architecture composed of loosely coupled elements that have bounded contexts.” – Adrian Cockcroft, Cloud Architect at Netflix “[...] a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.” – Martin Fowler, Chief Scientist at ThoughtWorks 13
  14. 14. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break the PoC into pieces… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) REST API (Java) DB (MySQL) Scan&Collect data Watermarks CRUD Multi-Users (administration) Multi-Customers (administration) Custom Batch Reports Enrich Data (GMaps, ext.) Basic Statistics Real-Time Analytics Real-Time Alerts 14 MOBILE APP
  15. 15. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break the PoC into pieces… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) DB (MySQL) Real-Time Alerts Custom Batch Reports Basic StatisticsEnrich Data (GMaps, ext.) Multi-Users (administration) Multi-Customers (administration) Real-Time Analytics Watermarks CRUD Scan&Collect data 15 MOBILE APP
  16. 16. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break data into pieces… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Real-Time Alerts Custom Batch Reports Basic StatisticsEnrich Data (GMaps, ext.) Multi-Users (administration) Multi-Customers (administration) Real-Time Analytics Watermarks CRUD Scan&Collect data Fault-Tolerancy Service Discovery Deployment Data consistency, modeling and evolution [complexity] Scalability 16 MOBILE APP
  17. 17. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces… Domain-driven considerations: • scan logic should be reliable and scalable • backoffice logic can be slower, under maintenance but eventually consistent …CQRS and Event Collaboration, anyone? (Martin Fowler) “CQRS stands for Command Query Responsibility Segregation. […] At its heart is the notion that you can use a different model to update information than the model you use to read information. […] CQRS fits well with event-based programming models. […] Having separate models raises questions about how hard to keep those models consistent, which raises the likelihood of using eventual consistency. […] CQRS allows you to separate the load from reads and writes allowing you to scale each independently.” write read – Martin Fowler, Chief Scientist at ThoughtWorks 17
  18. 18. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Scan&Collect data Enrich Data (GMaps, ext.) Real-Time Alerts Real-Time Analytics Custom Batch Reports Watermarks CRUD Multi-Users (administration) Multi-Customers (administration) 18 MOBILE APP
  19. 19. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Scan&Collect data Real-Time Alerts Enrich Data (GMaps, ext.) Watermarks CRUD Multi-Users (administration) Multi-Customers (administration) …events streams… Real-Time Analytics Custom Batch Reports 19 MOBILE APP
  20. 20. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces: how I met Kafka Enrich Data (GMaps, ext.) Real-Time Alerts Scan&Collect data Real-Time Analytics Custom Batch Reports 20
  21. 21. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces: how I met Kafka Scan&Collect data Enrich Data (GMaps, ext.) Custom Batch Reports Real-Time Analytics Real-Time Alerts 21
  22. 22. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces: how I met Kafka Enrich Data (GMaps, ext.) Custom Batch Reports Real-Time Analytics Real-Time Alerts Scan&Collect data … elasticsearch Amazon RDS 22
  23. 23. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Kafka Connect Benefits over ‘do-it-yourself’ Producers and Consumers: • Off-the-shelf, tested Connectors for common data sources are available • Features fault tolerance and automatic load balancing when running in distributed mode • No coding required, just write configuration files for Kafka Connect • Pluggable/extendable by developers 23
  24. 24. Copyright 2017 @ Bottega52 SRL - www.bottega52.it JDBC configuration example (standalone) CENSORED CENSORED CENSORED CENSORED CENSORED CENSORED Mode for detecting DB changes: incrementing, timestamp, timestamp+incrementing 24
  25. 25. Copyright 2017 @ Bottega52 SRL - www.bottega52.it • Exactly Once Delivery: the connector relies on Elasticsearch’s idempotent write semantics to ensure exactly once delivery to Elasticsearch. When the keys are not included, or are explicitly ignored, the connector will use topic+partition+offset as the key • Mapping Inference: The connector can infer mappings from the Kafka Connect schemas. If more customizations are needed (e.g. geo_point), we highly recommend to manually create mappings. Elasticsearch configuration example (standalone) 25
  26. 26. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Break flow into pieces: how I met Kafka Enrich Data (GMaps, ext.) Custom Batch Reports Real-Time Analytics Real-Time Alerts Scan&Collect data … Amazon RDS elasticsearch 26
  27. 27. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Kafka Streams Kafka Streams API is a lightweight Java library for building distributed stream processing applications in Kafka clusters • Easy to embed in your own applications • Supports windowing operations, and stateful processing including distributed joins and aggregation • Has fault-tolerance and supports distributed processing • Includes both a Domain-Specific Language (DSL) and a low-level API 27
  28. 28. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Elasticsearch Production Kafka Connect ES Production Break flow for this demo Enriched production topic Amazon RDS Kafka Connect JDBC KafkaKafka Streams Enrich Data (GMaps, ext.) Scan topic Watermarks topic 28
  29. 29. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Elasticsearch Demo Kafka Connect ES Demo Elasticsearch Production Kafka Connect ES Production Break flow for this demo Enriched demo topic Enriched production topic Amazon RDS Kafka Connect JDBC Custom Meetup Processor KafkaKafka Streams Enrich Data (GMaps, ext.) Scan topic Watermarks topic 28
  30. 30. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Kafka Streams example (custom demo process) 29
  31. 31. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Elasticsearch Demo Kafka Connect ES Demo Elasticsearch Production Kafka Connect ES Production Break flow for this demo Enriched demo topic Enriched production topic Amazon RDS Kafka Connect JDBC Custom Meetup Processor KafkaKafka Streams Enrich Data (GMaps, ext.) Scan topic Users topic 30
  32. 32. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Livedemo 31
  33. 33. Copyright 2017 @ Bottega52 SRL - www.bottega52.it REST API (Java) REST API (Java) REST API (Java) Conclusion: from this… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) REST API (Java) DB (MySQL) Scan&Collect data Watermarks CRUD Multi-Users (administration) Multi-Customers (administration) Custom Batch Reports Enrich Data (GMaps, ext.) Basic Statistics Real-Time AnalyticsFault-Tolerancy Scalability Real-Time Alerts 32 MOBILE APP
  34. 34. Copyright 2017 @ Bottega52 SRL - www.bottega52.it …to this… “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Real-Time Alerts Custom Batch Reports Basic StatisticsEnrich Data (GMaps, ext.) Multi-Users (administration) Multi-Customers (administration) Real-Time Analytics Watermarks CRUD Scan&Collect data Fault-Tolerancy Service Discovery Deployment Data modeling and evolution [complexity] Scalability 33 MOBILE APP
  35. 35. Copyright 2017 @ Bottega52 SRL - www.bottega52.it …and finally, this! Enrich Data (GMaps, ext.) Custom Batch Reports Real-Time Analytics Real-Time Alerts Scan&Collect data … Amazon RDS elasticsearch “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) 34
  36. 36. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Scan&Collect data … Amazon RDS Working on… Enrich Data (GMaps, ext.) Custom Batch Reports Real-Time Analytics Real-Time Alerts elasticsearch …KSQL? …Kafka Streams? “It’s just a Proof-of-Concept (PoC), we’ll have time for develop production code” (cit.) Ka=a REST Proxy 35
  37. 37. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Do I really need Kafka? …Big data? Not yet… [thousands of scan/day] …Fast data? Not really… [eventual consistency, seconds] …Do I really need the power of Kafka in this project? Probably no, but it allowed me to have: • Evolvability (change requests are coming from customers) • Reliability (thanks to Connect, Streams and KSQL in the future) • Scalable (thanks to Kafka) • Elegant data pipelines (of course, I can do almost everything as a LAMP stack) 36
  38. 38. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Event-streams, data pipelines, everywhere! 37
  39. 39. Copyright 2017 @ Bottega52 SRL - www.bottega52.it Event-streams, data pipelines, everywhere! The Log 38
  40. 40. Copyright 2017 @ Bottega52 SRL - www.bottega52.it My personal takeaways In my opinion: • Kafka is NOT ONLY for Big/Fast Data • Kafka is NOT ONLY for Stream Computation • Kafka enables new architectural patterns • Everyone can talk at this meetup, no matter how big (data) he is :) In the next episodes… 39 LOCKS
  41. 41. Contacts Matteo Ferroni Chief Technology Officer matteo@bottega52.it Bottega52 SRL Tel: +39 02 4003 0539 Via Durando 38/A 20158, Milano (MI) 40 Copyright 2017 @ Bottega52 SRL - www.bottega52.it
  42. 42. www.bottega52.it Cloud, IoT and Big Data Systems Engineering Bottega52 SRL - P.IVA: 08848340967 | Piazza della Vittoria 47, 26900 Lodi (LO), Italy | www.bottega52.it | info@bottega52.it | Phone: +39 02 4003 0539 Copyright 2017 @ Bottega52 SRL - www.bottega52.it

×